In the last article, we saw through the stages involved in a Database design. These stages were the categories where the effort involved can be modularised and cumulatively integrated at start of each stage.
Now there are other logical concepts required in logical and physical stage of database design. Database developers and administrators shall be competent enough to foresee the production roadmap and technical challenges in the future. In the article, I shall familiarise the basic building blocks and terminologies.
2. Entity Relationship Model
Entity Relationship Model is the one which is designed in Logical phase of DB design. It refers to the logical design of data as entity and its relationship with other entities. In technical terms, “entity” is a table which stores information, while “Relationship” is the relation between the entities.
A sample ER model
EMPLOYEE and DEPARTMENT are entities which store Employee and Department information in the tables. Consider few statements as below
Employee Id of an employee cannot be NULL or duplicate.
Every employee must belong to a Department.
Above two statements are enough to create an ER model for the data.
Relation database design stands on the fundamentals proposed by Mr. E.F. Codd. He identified 12 rules which a relational database must satisfy. Some references also include an additional rule which says that “A Relational Database Management System must be able to use it relational capabilities to manage the stored data”.
Codd’s rules of Database Design
Synopsis of the remaining rules are listed as below
1. All information in the database must be stored as values in a table
2. All database information must be accessible through the combination of a table name, primary key and column name.
3. The database must use NULL values to indicate missing or unknown information
4. The database schema must be described using the relational database syntax
5. The database may support multiple languages, but it must support at least one language that provides full database functionality (e.g. SQL)
6. The system must be able to update all updatable views
7. The database must provide single-operation insert, update and delete functionality
8. Changes to the physical structure of the database must be transparent to applications and users.
9. Changes to the logical structure of the database must be transparent to applications and users.
10. The database must natively support integrity constraints.
11. Changes to the distribution of the database (centralized vs. distributed) must be transparent to applications and users.
12. Any languages supported by the database must not be able to subvert integrity controls.
Data Normalization is the activity used to organize data in the database to sweep out redundant data and ensure sensible relationships. A normalized database not only ensures logical storage of data but also upgrades performance and data interactivity. A database can be normalized up to below degrees.
First Normal Form – Moving repetitive attributes in a table to a separate entity. Consider the below example of ORDER data for a day for a given Customer. Note that a Customer Id 100 orders 3 items in the day. In non-normalized state, all attribute values are stored in the same record. Note the iteration of the NAME, QTY, and POS attributes. In normalized state, the repeating attributes are segregated in separate records.
Second Normal Form – Once the data passes the 1NF, the attributes which are same in multiple rows can be moved to separate tables. 2NF talks about creating relationship between the two tables where one table contains the primary key data, while the other one contains the repetitive data for each key value.
Consider an example where Customer data needs to be normalized. A record contains Customer Id as key, its details and Item details. Since items and customer share a relationship, they logically appear in multiple records. But, customer details, which do not share any relation with items, unnecessarily repeat in the records. They can be maintained in separate table.
Third Normal Form
Once the data passes through 1NF and 2NF, it is eligible for 3NF. It refers to the removal of non key attributes. The attributes which are not dependent on the primary key of the table must be grouped as a separate entity.
In the below example, Customer and Items share relationship. The entity “Item” contains non key attributes like Item Mfg date and Item Prod date, which are not dependent on the Order Number Key. Instead, they describe a property of an Item.
The normalization levels proposed by E.F. Codd are considered as standard techniques. Additionally, Boyce Codd Normal Form (BCNF) addresses dependencies between attributes and Candidate key. Candidate key is the column which is eligible to be a primary key. If candidate key becomes primary key, then 1NF, 2NF, and 3NF apply to BCNF classification also.
Below are the additional normalization levels.
Fourth Normal Form: Multi-valued dependencies are broken down into separate entities. This implies that the attributes which are not dependent on the primary key can be segregated into different tables.
In the example, the table records the Marks obtained by a student in Term and Final exams in three subjects. Since Term and Final attributes are mutually exclusive and key column dependent, they can be treated as separate entities.
Fifth Normal Form:It is also called projection-join Normal form. A data satisfies 5NF if a unique pair is formed by the combination of one or more candidate keys. This implies that a set of data can be represented in more than one small entity.
4. Additional terms used in ER Modeling
In addition to above fundamentals, below are some terms which are required to be familiar with during DB design process. Below are few of them
1. Entity and Relationship – The objects whose information is required to be stored is called as Entity. It represents a thing or an event. An entity has differentiable characteristics and optionally, it can have descriptive characteristics too.
An entity can be associated to another entity through a defined Relationship.
In a statement “Oracle is best DBMS in the world”, the terms ‘Oracle’ and ‘DBMS’ are entities while ‘is best’ defines the relationship between the two. For example, ‘Employee’ is an entity.
Advantages of ER Model
• Easy to understand
• Eases the physical design from Logical design
• The ER model is focused on Data and not on process or methodology.
2. Attributes – Every entity contains logical set of attributes which define a characteristic of the entity by storing a value. For an entity ‘Employee’, ‘Salary’ and ‘Department’ are attributes.
3. Cardinality of Relationships – Relationship between the entities (tables) can exist in three forms as listed below.
a. One-to-one relationship – Two entities are related by a single relationship. For example, a Manager can manage one and only one Department.
b. One-to-many relationship – An entity can share multiple relationships with the other entity. For example, a Department can contain multiple Employees.
c. Many-to-many relationship – Two entities can share multiple relationship with each other. For example, a Master Part can contain several Parts, while a Part can be used in multiple Master Parts.
4. Primary Key – It is a column in the table which carries a unique and definite value throughout the table.
5. Foreign Key – It is a column in the Child table which always points to an attribute value in its Parent table.
Further to the concepts described in last and current articles, few other considerations are possible like chosing of DBMS and Data Models, rectification of data anomalies, business layer logic.
Possible DBMS can be Relational, Hierarchical and Network. Relationl is the widely used model. Hierarchical design involves Object or Object Relational Design while Network stores the Data as pointers to specify relationship between the entities.
I shall cover additional topics and concepts of Database Design in upcoming releases