NORMALIZATION

Modification anomalies

  1. A modification anomaly is a problem that could occur when the relation is updated.
  2. Deletion anomaly occurs when facts about two themes are lost with one deletion.
  3. Insertion anomaly occurs when the insertion of a fact about one theme requires a fact about a second theme.
  4. Referential integrity constraint is the condition in a database in which all interrelation constraints are satisfied.

Essence of Normalization

  1. Just as a paragraph should have a single theme, so should a relation have a single theme.
  2. When relations have more than one theme, they are vulnerable to modification anomalies.
  3. When a relation has more than one theme, break it into two (or more) relations, each having a single theme.
  4. A single theme is defined by a determinant and the attributes it determines.

Classes of Relations

  1. Increasing normal forms exhibit fewer anomalies.
  2. There is a procedure for getting from a normal form to next higher, except for domain/key normal form (DK/NF).
  3. Domain/key normal form relations have no anomalies for non-temporal data.
  4. There is no fixed procedure to put a relation into DK/NF.

First thru fifth normal forms

  1. A relation is placed into a higher normal form by splitting it to satisfy the following normal form definitions.

First normal form: any table of data that meets the definition of a relation.

Second normal form (2NF)

  1. Anomalies can occur when attributes are dependent on only part of a multi-attribute key.
  2. A relation is in second normal form when all non-key attributes are dependent on the whole key.
  3. That is, no attribute is dependent on only a part of the key.
  4. Any relation having a key with a single attribute is in second normal form.

Third normal form (3NF)

  1. Anomalies can occur when a relation contains one or more transitive dependencies.
  2. A transitive dependency exists when A-->B-->C and NOT B-->A.
  3. A relation is in 3NF when it is in 2NF and has no transitive dependencies.
  4. A relation is in 3NF when "All non-key attributes are dependent on the key, the whole key and nothing but the key".

Boyce-Codd Normal (BCNF)

  1. Anomalies can occur in relations in 3NF if there are determinants in the relation that are not candidate keys.
  2. A relation is in BCNF if every determinant is a candidate key.

Fourth normal form (4NF)

  1. Multivalued dependencies can occur in relations in BCNF.
  2. A multivalued dependency (MVD) occurs when a single attribute, say A, multidetermines two other independent attributes, say B and C. That is, A-->>B and A-->>C.
  3. X-->>Y, ie X multidetermines Y, when for each value of X we can have more than one value of Y.
  4. A relation is in 4NF when it is in BCNF and has no multivalued dependencies.

Fifth normal form (5NF)

  1. Deals with obscure join dependencies.
  2. The consequences (if any) of such dependencies are not known.

Domain/Key normal form (DK/NF)

  1. A relation in DK/NF has no modification anomalies, and conversely.
  2. DK/NF is the ultimate normal form; there is no higher normal form related to modification anomalies
  3. Defn: A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys and domains.
  4. Constraint is any rule governing static values of attributes that is precise enough to be ascertained whether or not it is true
  5. E.g. edit rules, intra-relation and inter-relation constraints, functional and multi-valued dependencies.
  6. Not including constraints on changes in data values or time-dependent constraints.
  7. Key - the unique identifier of a tuple.
  8. Domain: physical and a logical description of an attributes allowed values.
  9. Physical description is the format of an attribute.
  10. Logical description is a further restriction of the values the domain is allowed
  11. Logical consequence: find a constraint on keys and/or domains which, if it is enforced, means that the desired constraint is also enforced.
  12. Bottom line on DK/NF: If every table has a single theme, then all functional dependencies will be logical consequences of keys. All data value constraints can them be expressed as domain constraints.
  13. Practical consequence: Since keys are enforced by the DBMS and domains are enforced by edit checks on data input, all modification anomalies can be avoided by just these two simple measures.

The synthesis of relations

  1. An alternative approach to designing relations that will not have modification anomalies is to use functional dependencies to synthesize relations.
  2. Assume no attribute belongs to any relation and join them into relations according to their functional dependencies.
  3. Attributes that are related can be related in 1 of 3 ways: 1-to-1, many-to-1, many-to-many.
  4. 1-to-1: A-->B and B-->A, so A and B must occur together in at least one relation.
  5. Many-to-1: A-->B, but NOT B-->A, so A is the determinant and is the key of the relation and B is a non-key attribute.
  6. Many-to-many: NOT A-->B and NOT B-->A, so the composite of both attributes must be the key of the relation, i.e. an intersection table.
  7. Not practical for large numbers of attributes because there are too many pairs of attributes to check.