This article will provide a general overview of distributed systems, highlighting their advantages and the challenges involved in designing and maintaining such a system. Furthermore, we will explain why blockchain technology presents itself as a viable alternative to traditional distributed systems, underlining how it can overcome a series of challenges faced by these types of systems.
Distributed systems have become an integral component of our increasingly digitized society that supports the myriad of services, industries, and business interactions that we take for granted. When users interact with a service, for example, a browser search, we tend to believe that we interact with a single entity that retrieves the data we requested, ignoring complex inner workings like how data is stored, the servers involved, or how information gets to the browser. This is directly related to a concept present in IT, called abstraction, through which systems abstract the complexity of an apparently simple operation from the end-user. The same concept applies to every distributed system and software application we interact on a daily basis.
The advent of the modern computer era in 1945 is a turning point in human evolution that marks the beginning of our ever-increasing reliance on technological solutions to increase the quality and quantity of business interactions, research, and development as well as ushering in new levels of availability of information to the public.
Over more than half a century later, in 2020 we find ourselves embarked on a fast-paced race towards the digital transformation of society on a fundamental level, where innovative software solutions and software architecture is a key component that dictates the strategic competitive differentiator between companies and industries. More so than ever, enterprises are investing millions of dollars in technology and software solutions to better serve the interests and needs of their customer base, as well as to distance themselves from the competition. We have reached a point in the evolution of the technological landscape where service providers are required to design, provide, and maintain services that are always available, regardless of time and geographical position. This new standard of permanent availability as well as the need for optimal functionality that manages to overcome the inevitable fluctuations and failures of the complex computation required to maintain such an architecture are handled by distributed systems.
The rise of distributed systems
In the early days of computing systems and information technology, computers were so big and cumbersome to use and maintain that they required rooms and even entire floors of a building for storage, as well as significant quantities of power to run. This made them rigid and limited from a performance and functional perspective as well as available only for a select few institutions that could afford the costs involved. Furthermore, the technology available at that time made it impossible for computers to connect with each other, so they operated independently.
In the mid-1980s two technology advancements shattered the barrier in connectivity between computers, enabling the creation of distributed systems.
The first breakthrough was the development of increasingly powerful microprocessors. Gradually, machines evolved from 8-bit to 16-bit, 32-bit, and 64-bit central processing units (CPUs). The emergence of multi-core CPUs facilitated the concept of parallelism in which multiple processes can be run at the same time, enabling the creation of more powerful and complex operations.
The second breakthrough was the emergence of high-speed computer networks. Local-area networks (LANs) allowed thousands of machines situated in close physical proximity to connect and communicate with each other. On the other hand, the creation of wide-area networks (WANs) enabled hundreds of millions of machines from all over the globe to communicate with each other. By removing geographical barriers, enterprises and corporations were now able to create the so-called “supercomputers”.
In addition, the rapid rate of miniaturization of computer systems, paired together with the increase in performance and the ability to communicate, meant that companies could create distributed systems that far surpass the capabilities of previous computers, at a fraction of the cost.
What is a distributed system
A distributed system can be broadly defined as a collection of autonomous computing elements that appears to its users as a single coherent system.
Although this definition may seem simplistic, it encompasses two key features present in every distributed system. The first one is that in essence, distributed systems are a collection of computing elements, commonly referred to as nodes, that are able to act independently of each other. The second common feature shared by every distributed system is that due to the layers of abstraction involved, users tend to believe that they are dealing with a single system. This characteristic suggests the fact that in order to function as a single cohesive system, the collection of autonomous nodes need to be able to communicate and collaborate with each other to achieve a common goal. As such the key challenge toward building a distributed system is related to ensuring and maintaining a stable line of communication between all the elements that compose the network.
Types of distributed systems
Distributed systems usually fall into one of four different basic architecture models:
The client-server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service (server) and the party that makes a request for a service or resource (client). In a classic client-server architecture flow scheme, the client forwards a request for data and the server accepts the request, processes it, and sends back a response that contains the data packets requested by the client.
The three-tier architecture is a type of distributed software architecture that is composed of three tiers of logical computing that can be modified according to the business logic of the company:
- The presentation tier is the front end layer in the three-tier system and usually consists of the user interface. This is what the software user sees and interacts with. This tier also acts as a go-between for the data tier and the user, passing on the user’s different actions to the logic tier.
- The application logic tier contains the functional business logic which drives an application’s core capabilities. This tier is also the one that writes and reads data into the data tier.
- The data tier is represented by the storage system/database engine used by the system to store all the accumulated data. Information stored in the data tier is accessed by the application layer via API calls.
The n-tier architecture also referred to as multiple-tier architecture follows the same logic behind the three-tier architecture. As the name implies, the main difference is that it can have additional logic tiers added to its architecture. This type of architecture is generally used when an application or server needs to forward requests to additional enterprise services on the network.
In a peer-to-peer (P2P) network, individuals can interact with each other without requiring the intervention of a third-party service to intermediate the exchange of data. This means that the system lacks any additional machines used to provide services and manage resources. In a P2P network, responsibilities are distributed among each node in the system, commonly referred to as peers, that can serve as either client or server.
Distributed system categories
A distributed data store, also known as a distributed database is a computer network where data is stored in more than one node, usually in a replicated fashion. A large portion of distributed database systems are NoSQL non-relational databases that facilitate quick access to data over a large number of nodes. Depending on the use case, some distributed databases expose rich query abilities while others are limited to key-value semantics. As a rule, distributed data stores provide high levels of performance and scalability at the cost of consistency or availability.
The latter aspect is stipulated by the CAP theorem introduced by Eric Brewer in the year 2000. The acronym stands for Consistency, Availability, and Partitioning. The CAP theorem states that a distributed data store cannot simultaneously be consistent, available, and partition tolerant, and is required to sacrifice one of the three characteristics.
- Consistency – according to the CAP theorem, consistency means that all the nodes from a cluster see the same data at any given point in time. Furthermore, it also entails that once a client writes a value to any server and gets a response, it expects to get that value (or a fresher value) back from any server it reads from.
- Availability – read and write operations will always succeed even if a number of nodes are down. If a client sends a request to a server and the server has not crashed, then the server must eventually respond to the client. The server is not allowed to ignore the client’s requests.
- Partition tolerance – the system will continue to function even if nodes from a cluster can no longer communicate with each other.
Because by definition distribution systems are composed of multiple machines that communicate with each other, partition tolerance becomes mandatory for every system. For example, if you have two nodes that accept information and their connection is down, it would be impossible for them to be available while also providing consistency. The trade-off is the following, both nodes do not know what the other one is doing, as such they can either go offline (unavailable), or work with inconsistent data. Furthermore, network latency can become an issue if you have a network of machines that need to synchronize to achieve strong consistency. These are some of the reasons why system designers usually tend to pair partition tolerance with availability.
Distributed computing is a technique through which components of a software system are shared among multiple computers to improve efficiency and performance. Prominently used in processing big data, it involves the splitting of an enormous task that no single computer can complete into smaller, more manageable tasks and distributing them between multiple machines that work in parallel to solve the task, after which the results are aggregated. The benefit of the distributed computing technique is that theoretically, you have the potential to scale horizontally indefinitely to increase computing capacity (to add new machines to the network that can take part of the workload).
Distributed file systems are very similar to distributed data stores, in the sense that they are systems tasked with storing and facilitating access to large amounts of data across a cluster of machines that appear as a single entity. Distributed file systems make it convenient to share information and files among users on a network in a controlled and authorized way. The server allows the client users to share files and store data similar to storing the information locally. However, servers have full control over the data and give access control to the clients.
Distributed messaging is based on the concept of reliable message queuing. Messages are queued asynchronously between client applications and messaging systems. A distributed messaging system provides the benefits of reliability, scalability, and persistence.
Messaging systems provide a central place for the storage and propagation of messages/events inside your overall system. They allow you to decouple your application logic from directly talking with your other systems.
Distributed messaging follows the publish-subscribe model where the senders of the messages are called publishers and those who want to receive the messages are called subscribers.
Once a message has been published by the sender, the subscribers can receive the message with the help of a filtering option. Filters can be topic-based or content-based. Popular messaging platforms are RabbitMQ, Apache Kafka, ActiveMQ.
Distributed applications are software applications that are stored and executed over distributed networks such as P2P networks, cloud platforms, and blockchain networks. In a client-server structure, distributed applications run simultaneously on the server and client machine. The front end of the operation runs on the client computer, while the back-end operations that require more processing power unfold on the server-side.
Distributed ledgers are immutable, append-only database systems that are replicated, synchronized, and shared across all the nodes that compose the distributed network.
The most popular implementation of distributed ledger (DLT) technology is blockchain, a distributed incorruptible ledger of economic transactions that can be programmed to record not only financial transactions but virtually any type of data that has value.
Blockchain is a digitized, distributed database that records all the information introduced in a decentralized peer to peer network in structures called blocks. Each block contains transaction data and metadata (a set of data that provides information about the respective block). The advantage of this structure is that each block is constructed upon the previous block, in a chain-like structure (hence the name blockchain), by calculating the hash of the previous block and combining it with the hash of the second block of transactions. This complex design is what gives the data introduced in the blockchain its immutability and integrity. If a malicious actor attempts to alter the data from a block, every change will be immediately noticed by the system and every other network participant, because it will render all the following blocks invalid.
The unique design choices makes blockchain ideal for data storage. As an append-only structure, which means that data can only be introduced into the system, it can never be completely deleted. Any changes made are stored further down the chain, but an admin can always see when the changes occurred, who made them as well as the previous version of the data. When a new block is created and appended to the blockchain, all the information contained by the new block will be available to every member of the network. Once recorded, the data in any given block cannot be altered retroactively without the alteration of all subsequent blocks, which requires the collusion of the network majority.
In the past decade, blockchain has become deeply ingrained in the discourse of savvy entrepreneurs who recognize the potential of this technology to streamline operations across multiple key industries and spheres of activity. This is because blockchain comes with a series of inherent characteristics that make it an ideal storage medium for sensitive information: transparency, data immutability, and integrity, distribution, complex cryptographic algorithms. These features enable blockchain to enhance audit processes, settle disputes, and safeguard company data from cyberattacks.
Why use distributed systems?
Distributed systems have appeared out of the need to solve increasingly complex computational problems that cannot be solved by a single machine. Often than not, distributed systems are very difficult to deploy, maintain, and debug, but they are worth the effort and costs involved due to their performance, scalability, low latency, and fault tolerance.
Reliability and fault tolerance
By definition, distributed systems are composed of multiple nodes that work towards achieving a common goal. Because of this architecture, the system is considered to be reliable and fault-tolerant because even if some of the nodes that compose the network are compromised, the application will continue to function.
This metric refers to the total amount of time the system is available for end-use applications. In general, distributed systems are characterized by high levels of availability. Because they are composed of multiple machines, distributed systems can distribute the workload during maintenance procedures to ensure business continuity.
Latency measures how much time it takes for a data packet to travel from one designated point to another. Ideally, latency should be as close to zero as possible. The time it takes a network packet to travel around the world is limited by the laws of physics, namely the speed of light. For example, the shortest possible time for a request’s trip back and forth in a fiber-optic cable between New York to Sydney is 160ms. This means that it is impossible to build a system that can perform faster than that, so the goal is to come as close as possible to that value. The advantage of distributed systems is that nodes can be distributed geographically (in our example, one node in each city), to allow traffic to hit the node that is closest to it.
Depending on the type of system involved, scalability can mean many different things, but in general, it is the property of a system to handle a growing workload by adding additional resources and without hindering performance. In short, a scalable system is one that is able to continue its operations within optimal parameters as its user base, workload, and resources begin to grow.
- Size scalability: the ability to add more users and resources to a system without a noticeable decrease in performance
- Geographical scalability: a system that can continue to function efficiently and can be accessed by its users regardless of the distance between the users and the resources of the system without any noticeable delay in communication
- Administrative scalability: the ability to scale a distributed system across multiple, independent administrative domains without hindering performance
- Functional scalability: the ability to expand the functionality of a system without hindering the ongoing operations
- Load scalability: the ability of a system to expand and contract to accommodate heavier or lighter workloads
- Generation scalability: the ability of a system to increase performance by adopting a new generation of components (a faster processor, a faster memory, newer version of an operating system, a more powerful compiler) without disrupting the operational flow
- Heterogeneous scalability: the ability of a system to scale up by integrating hardware and software components from different vendors. This calls for using components with a standard, open architecture, and interface. In software, this property is called portability
Vertical scaling, also known as scaling up is a technique through which the performance of an existing system is increased by upgrading to the latest hardware components. The problem is that this technique is limited by the performance of the hardware components available on the market, which is often insufficient for technological companies that require a large pool of computing power to cope with their workload.
Horizontal scaling also referred to as scaling out is a technique employed in distributed systems to increase the computational power of the network by adding new nodes, rather than upgrading the hardware of a single node. Besides being a cheaper alternative to vertical scaling, the main advantage of horizontal scaling is its seemingly infinite potential to add new machines to the network to increase performance.