0%

ZooKeeper笔记

Introduction to ZooKeeper

Introduction to Distributed System

Many independent programs running on set of different computers.

Introduction to Coordination Service

Co-ordination service is service which

  1. communicated with all computers in system.
  2. coordinates with all computers in system.
  3. manages all computers in system.
  4. controls with all computers in system.

What is ZooKeeper

Apache ZooKeeper is the coordination service for distribute systems. It is

  • Open source
  • Reliable & scalable
  • High-performance
  • Lightweight

Distributed System

Understanding distributed system

A distributed system is defined as a software system that is composed of independent computing entities linked together by a computer network whose components communicate and coordinate with each other to achieve a common goal.
分布式系统定义为一种软件系统,它由通过计算机网络链接在一起的独立计算实体组成,计算机网络的组件相互通信和协调以实现共同的目标。

Characteristics of distributed system

  1. Resource sharing(资源贡献):
    This refers to the possibility of using the resources in the system, such as storage space, computing power, data and services from anywhere and so on…
  2. Extensibility(可扩展):
    This refers to the possibility of extending & improving the system incrementally, both from hardware and software perspectives.
  3. Concurrency(并发):
    This refers to the system’s capability to be used by multiple users at the same time to accomplish the same task or different tasks.
  4. Performance and scalability(性能和可扩展性):
    This ensure that the response time of the system doesn’t degrade as the overall load increases.
  5. Fault tolerance(容错):
    This ensures that the system is always available even if some of the components fail or operate in a degraded mode.
  6. Abstraction through APIs:
    This ensures that system’s individual components are concealed from the end users, revealing only the end services to them.

Fallacies while designing Network File System

Perfect Assumptions

  • The network is reliable
  • Latency is zero
  • Bandwidth is infinite
  • The network is secure
  • Topology doesn’t change(拓扑不变)
  • There is one administrator
  • Transport cost is zero
  • The network is homogeneous

Challenges while implementing coordination in distributed system

  • Everything depends on master node. Dependence (The master is offline, then all clients belong it is unavailable.)
  • Service Discovery
  1. To increase availability of application & bear the load, we add more servers.
    为了增加应用程序的可用性并承担负载,我们添加了更多服务器。
  2. We can tell client machines about availability of new servers.
    我们可以告诉客户端计算机新服务器的可用性。
  3. To do this, we need to implement that logic in client carefully.
    为此,我们需要在客户端中仔细实现该逻辑。
  • Scalability
  1. Scalability means to make system grow bigger in size.
    可伸缩性意味着使系统更大。
  2. By adding new machines, chances of crashing the system also increases.
    通过添加新机器,崩溃系统的机会也增加了。
  3. Common reasons of failure are hardware faults, system crashes, communication link failures
    失败的常见原因是硬件故障,系统崩溃,通信链接故障。

Introduction to Apache ZooKeeper

  • Apache Zookeeper is a software developed by Apache.
  • It acts as centralized service & maintains naming & configuration data.
    它充当集中服务并维护命名和配置数据。
  • It also provides flexible and robust synchronization within distributed systems.
    它还在分布式系统内提供灵活而强大的同步

Architectural Services of ZooKeeper

  1. Distributed Consensus:
    Its an agreement on a single data value among a group of process connected by an unreliable network.
    分布式共识处理通过不可靠网络连接的一组进程之间的单个数据值达成协议。
  2. Group Management:
    Managing groups of nodes in a distributed environment always requires a critical implementation of high-performance coordination services.
    在分布式环境中管理节点组始终需要高性能协调服务的关键实现。

  3. Presence Protocol:
    This defines presence-related extensions session initiation. This set of protocols describe how other protocols are used by terminals to establish, modify, and terminate sessions.
    这定义了与状态相关的扩展会话启动。这套协议描述了终端如何使用其他协议来建立,修改和终止会话。

  4. Leader Election:
    Leader election is the process of designating a single process as the organizer of some task distributed among several nodes
    领导者选举是将单个过程指定为分布在多个节点之间的某些任务的组织者的过程

ZooKeeper coordination tasks

With ZooKeeper, developers can implement common distributed coordination tasks, such as the following:

  1. Configuration management
  2. Naming service
  3. Distributed synchronization, such as locks and barriers
  4. Cluster membership operations, such as detection of node leave/node join

ZooKeeper Ensemble

In a production environment, ZooKeeper should be run on multiple servers in a replicated mode, also called a ZooKeeper ensemble.
在生产环境中,ZooKeeper应该以复制模式在多个服务器上运行,也称为ZooKeeper集成。

The minimum recommended number of servers is three, and five is the most common in a production environment.
建议的最小服务器数量为三台,而在生产环境中,最常见的服务器数量为五台。

Chapter 2

Understanding ZooKeeper Services

  • ZooKeeper is coordination service for distributed applications.
    ZooKeeper是用于分布式应用程序的协调服务。

  • So its objective is to solve the difficult issues associated with the coordination of components in a distributed application.
    因此,其目标是解决与分布式应用程序中的组件协调相关的难题。

  • This is done by exposing simple powerful interface of primitives.
    这是通过公开简单而强大的基元接口来完成的。

  • Applications can be designed on these primitives implemented through ZooKeeper APIs.
    可以在通过ZooKeeper API实现的这些原语上设计应用程序。

  • It can solve problems of

    • distributed synchronization
    • cluster configuration management
    • group membership and so on.
      它可以解决 分布式同步 集群配置管理 小组成员等等。
  • ZooKeeper is a replicated and distributed application with the intention to run as a service.
    ZooKeeper是一个复制和分布式应用程序,旨在作为服务运行。

  • This replicated set of servers on which the ZooKeeper service is running, called ZooKeeper ensemble.
    运行ZooZeeper服务的此复制服务器集,称为ZooKeeper集成

  • Clients can connect to a ZooKeeper service by connecting to any member of the ensemble.
    客户端可以通过连接到集成的任何成员来连接到ZooKeeper服务。

  • The members of the ensemble are aware of each other’s state.
    合奏的成员知道彼此的状态。

  • They save information in durable manner in the local data store.
    它们以持久的方式将信息保存在本地数据存储中

  • ZooKeeper is a highly available service
    ZooKeeper是一项高可用性服务

  • As long as a majority of the servers are available, the service will always be available.
    只要大多数服务器可用,该服务将始终可用。

ZooKeeper Data Model

  • ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace of data registers.
    ZooKeeper允许分布式进程通过共享的数据寄存器分层名称空间相互协调。

  • The namespace looks quite similar to a Unix file system.
    命名空间看起来与Unix文件系统非常相似。

  • The data registers are known as znodes in the ZooKeeper.
    数据寄存器在ZooKeeper中被称为znodes。

  • znodes are organized hierarchically as a standard file system.
    znodes被分层组织为标准文件系统。

  • The root node has one child znode called /zoo.
    根节点具有一个称为/zoo的子znode。

  • Every znode in the ZooKeeper tree is identified by a path.
    ZooKeeper树中的每个znode均由路径标识。

  • The path elements are separated by /.
    路径元素由/分隔。

  • znodes are called data registers as they can store data.
    znode被称为数据寄存器,因为它们可以存储数据。

  • The znode can have children as well as data associated with it.
    znode可以具有子节点以及与其关联的数据。

  • Every znode maintains a stat structure.
    每个znode都维护一个统计结构。

  • A stat simply provides the metadata of a znode.
    统计信息仅提供znode的元数据。
    It consists of

    • Version number
    • Action Control List (ACL)
    • Data length
    • Timestamp
  • Version number:
    Every znode has a version number and it means every time the data associated with the znode changes, its version number would also increase.
    每个znode都有一个版本号,这意味着与znode关联的数据每次更改时,其版本号也会增加。
    The use of version number is important when multiple zookeeper clients are trying to perform operations over the same znode.
    当多个Zookeeper客户端试图在同一znode上执行操作时,使用版本号很重要。

  • Action Control List (ACL):
    It is authentication mechanism for accessing the znode.
    它是用于访问znode的身份验证机制。
    It governs all the znode read and write operations.
    它控制着所有znode的读写操作。

  • Data length:
    It is total amount of the data stored in a znode.
    它是存储在znode中的数据总量。
    You can store a maximum of 1MB of data.
    您最多可以存储1MB的数据。

  • Timestamp:
    It is time elapsed from znode creation & modification. It is usually represented in milliseconds.
    从znode创建和修改开始到现在为止。通常用毫秒表示。
    ZooKeeper identifies every change to the znodes from “Transaction ID” (zxid).
    ZooKeeper通过“交易ID”(zxid)识别对znode的所有更改。
    Zxid is unique & maintains time for each transaction so that you can easily identify the time elapsed from one request to another request.
    Zxid是唯一的,并且为每个事务维护时间,因此您可以轻松地确定从一个请求到另一个请求所经过的时间。

Have fun.