Computer systems composed of multiple interconnected computers that communicate and coordinate their actions through message passing to achieve a common goal. Examples - Microservices, cloud-computing platforms, peer-to-peer networks, Blockchains and Distributed Databases.
Why distributed systems?
- Scalable
- can handle large data volumes and number of transactions
- Reliable/Resilient
- Reliability is a measure of the probability that a system will fail in a given period
- ensuring that the system doesn’t go down
- eliminate every single point of failure - i.e. being fault tolerant
- this is done through redundancy/replication of both components and data
- Replication means sharing information to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility
- Available
- measure of the percentage of time that a system, service, or a machine remains operational under normal conditions
- i.e. no downtime, operational 100% of the time
- Reliability often implies Availability, but not necessarily other way around
- Efficient
- Latency/ Response time: delay to obtain first item
- Throughput/ bandwidth: no of items delivered in a time unit
- Manageability/Maintainability
- Ease of operation and maintenance
- Ease of diagnosis - observability through traces and logs
- Easy to update - code and infrastructure updates
- Modularity and flexibility
- to an extent - though that is not the main goal.
See also Evaluation of Software Systems