关系数据库管理系统(RDBMS)笔记

关系型数据库管理系统（Relational Database Management System，RDBMS）

亲，双语对照

名词解释

候选键：表中可以唯一确定一个元组的某个属性（或者属性组）叫候选键
主键 : 从许多候选键(可包含多个属性)中挑一个用来维护记录的完整性就叫主键。
外键：一个属性（或属性组），它不是这个表的主键，但是它别的表的主键，它就是这个表的外键。
主属性：一个属性只要在任何一个候选键中出现过，这个属性就是主属性。
非主属性：与上面相反，没有在任何候选键中出现过，这个属性就是非主属性。

数据模型

A data model is a description of the data in a database. In addition, the data model describes the relationship among data, and any constraints that have to be defined on the data. Data models can broadly be classified into the following categories.

Object-based logical model: Focuses on desctibing the data, the relationship among the data, and any constraints defined.
Record-based logical model: Focuses on specifying the logical structuring of the database.

ER模型(ER Model)

ER模型是基于对象的逻辑模型(Object-Based Logical Model)
“实体”-“关系”模型，是描述数据及其联系的数据模型
The ER model views the real world as a collection of objects or entities and the relationship among them.

实体

实体：客观存在并可以相互区分的事物
Entities:a thing, whick can be easily identitfied.

属性

属性：实体所具有的若干特征
Attributes: an attribute is a property of given entity.

关系

关系：实体之间的联系
Relationships: an association among entities.
A relationship type is an association of entity types. A relationship instance is an association of entity instances.
A relationship can associate an entity with itself.
关系类型：

One-to-One
One-to-Many(or Many-to-One)
Many-to-Many

子类型和超类型

子类型是另一个实体的子集
A subtype is a subset of another entity.
超类型就是那个完整的集合咯
A subentity or subtype is always dependent on the supertype for its existence.The attributes of a supertype apply to all of its subtypes. The converse is not true.
The supertype is connected to the relationship with a line containing a crossbar, as indicated in the preceding diagram. The attributes, which describe supertype will also belong to all subtypes. The subtype is described by the attributes that are unique to it.

基于记录的逻辑模型(Record-Based Logical Model)

层次模型：树形结构
网状模型：不加任何条件限制的无向图
关系模型：采用二维表来表示，满足一定条件的二维表称为关系；二维表中凡能唯一标识元祖的最小属性集称为该表的键或码。

Hierarchical Model(层次模型)

In this model, data is represented in the form of a tree, and relationships between the data are represented by links.

Network Model(网状模型)

In this model, the data and relationships among them are represented in the form of records and links. It is similar to a hierarchical model. However, records in a databse are represented by graphically.

Relational Model(关系模型)

In this model, the table in a database has a fixed length record with fixed number of attributes or fields. Of these three models, the relational model is the most popular.

关系模型

关系数据结构

An RDBMS can be defined as a database management system where all the data visible to the user is organized strictly as tables of data values, and all the database operations work on these tables.

The basic principle in a relational database is a tabular arrangement of data values. Each table in a databse has a unique table name that identifies its contents.

The row(or record) in the table is called a tuple, and the column(or field) is called an attribute. Every column in a table must have a unique name. The number of tuples is called the cardinality of the table, and the number of attributes is called the degree of the table.
One important property of a relation is that its rows are unordered. Every table must have some column or conbination of columns that uniquely identifies each row in the table.
No two rows of table contain the same values in that column or column conbination. This column is called the primary key.
A table where every row is different from all other rows is called relation.
Each attribute of a table has an associated domain(域). A domain is a collection of values from which one or more attribute(columns) draw their actual values. The domain lists the possible values for an attribute. (域就是所有取值可能)

ER图(ER Diagram)

矩形框：表示实体，在框中记入实体名。
菱形框：表示联系，在框中记入联系名。
椭圆形框：表示实体或联系的属性，将属性名记入框中。对于主属性名，则在其名称下划一下划线。
连线：实体与属性之间；实体与联系之间；联系与属性之间用直线相连，并在直线上标注联系的类型。（对于一对一联系，要在两个实体连线方向各写1；对于一对多联系，要在一的一方写1，多的一方写N；对于多对多关系，则要在两个实体连线方向各写N,M。)

关系运算(Relational Operators)

Restrict 约束限制

The restrict operator extracts specified tuple or rowss from a given relation besed on a condition.
约束：根据条件提取几个元组

Project 投影

The project operator extracts specified attributes or columns from a given relation.
提取几列成为新的表

Produce 乘

The product operator joins two relations such that every tuple of the first relation is matched with every tuple of second relation. To be product compatible, the two tables must have common attributes.

Union 并

The union operator builds a relation from tuples appearing in erither or both of the specified relations where as intersect operator builds a relation consisting of tuples that appear in both the relations.

Intersect 交

The intersect operator builds a relation consisting of tuples that appear in both the relaiton.

Difference 差

The difference operator builds a relation of tuples appearing in the first but not in the second of the two specified relation.

Join 连接

The join operator builds a relation from two specified relations. This relation consists of all the possible combinations of tuples, one from each relation that satisfies the specified condition. The join operation requires a common attribute.

Divide 除

The divide operator takes two relations and builds another relation consisting of values of an attribute of one relation that match all the values in the other relation. The divide operation is the opposite of the product operation.

数据库设计(Database design)

将图标映射到表格

ER图中几个重要的组成

Regular entities
Attributes
Relationships
Weak entities
Subtypes and supertypes

Weak entities

A weak entities is an entity whose existence depends on some other entity. The weak entities have the composite key, part of which consists of the primary key of the entity they depend on.

基于逻辑的数据库设计

Attributes

It identifies its owner entity.
It refers to another entity.
It simplifies the description of an entity.

Keys

Primary
Foreign
Candidate
Alternate
Composite

Any attribute(or set of attributes) that uniquely identifies a row in table is a candidate for the primary key(主键). Such an attribute is called a candidate key(候选键). One of the candidate keys is chosen to be the primary key, based on the familiarity and greater usage. Any attribute that is a candidate for the primary key but is not the primary key is called the alternate key(替代键). When the key that uniquely identifies the row of the table is made up of more than one attribute, it is called a composite key(组合键).

Entities

Some attributes may acquire further attributes to qualify themselves during databses design and become entities.

Subentities

Specialization(特化) is the result of taking a subset of a higher-level entity set to form a lower-level entity set.
Generalization(概化) is the result of taking the union of two or more lower-level entity set to produce a higher-level entity set. It is the opposite of specialization.

规范化定义(范式)

Normalization is a method of breaking down complex table structures into simple table structures by using certaion rules. Using this mathod, you can reduce redundancy in a table, and eliminate the problems of inconsistency and disk space usage. You can also ensure that there is no loss of information.

Normalization has the following benefits

It helps in maintaining data integrity(完整).
It helps in simplifying the structure of tables, therefore, making a database more compact(简洁).
It helps in reducing the null values, which reduces the complexity of data operation.

Achieve a good database design

Each table should have an identifier
Each table should store data for a single type of entity.
Columns that accept NULLs should be avoided.
The reprtition of values or columns should be avoided.

第一范式(1NF)

A table is said to be 1NF when each cell of the table contains precisely one value.
定义：符合1NF的关系中的每个属性都不可再分解, 并且在一条记录里一个属性只能对应一个值

Guidelines

Place the related data values in a table. Further, define similar data values with the column name.
There should be no repeating group in the table.
Every table must have a unique primary key.

Functional dependency(函数依赖)
如果属性A依赖B，那么B就叫决定因素determinant

函数依赖就是对于每个x都有唯一的y与之对应，记作x->y
部分依赖(partial dependency) 不完全依赖于键，如有组合键时只依赖于组合键中的某一个键
依赖传递(transitive dependency) x->y->z

第二范式(2NF)

A table is said to be in 2NF when:

It is in 1NF, and
No partial dependency exists between non-key attributes and key attributes.
定义：符合1NF的前提下, 当表存在组合主键的时候, 每一个非键属性都必须完全依赖于组合主键中的所有主键, 而不能只依赖部分主键

Guidelines

Find and remove attributes that are functionally dependent on only a part of the key and not on the whole key. Place them in a different table.
Group the remaining attributes.

第三范式(3NF)

A relation is said to be in 3NF if and only if:

It is in 2NF, and
No transitive(indirect) dependency exists between non-key attributes and key attributes.

定义：符合2NF的前提下, 每一个非主键属性和主键之间不能有依赖传递(间接依赖)

Guidelines

Find and remove non-key attributes that are functionally dependent on attributes that are not the primary key. Place them in a different table.
Group the remaining attributes.

BCNF(鲍依斯-科得范式)

The original definition of 3NF was not sufficient in some situations.
不满足的情况

That had multiple candidate keys.
Where the multiple candidate keys were composite.
Where the multiple candidate keys overlapped(had at least one attribute in common).
A relation is in BCNF if and only if every determinant is candidate key.

定义：符合3NF的前提下, 同时主属性内部没有部分依赖或者依赖传递关系

Guidelines

Find and remove the overlapping candidate keys. Place the part of the candidate key an th attribute it is functionally dependent on, in a different table.
Group the remaining items into a table.

逆范式

The intentional introduction of redundancy in a table in order to improve performance is called denormalization.
逆范式化指的是根据项目情况故意通过增加冗余或重复的数据来提高数据库的查询性能(从而避免频繁使用性能较差的连接语句或子查询语句)