NORMALIZATION
Modification anomalies
- A modification anomaly is a problem that could occur when the relation is
updated.
- Deletion anomaly occurs when facts about two themes are lost with one
deletion.
- Insertion anomaly occurs when the insertion of a fact about one theme
requires a fact about a second theme.
- Referential integrity constraint is the condition in a database in which
all interrelation constraints are satisfied.
Essence of Normalization
- Just as a paragraph should have a single theme, so should a relation have
a single theme.
- When relations have more than one theme, they are vulnerable to
modification anomalies.
- When a relation has more than one theme, break it into two (or more)
relations, each having a single theme.
- A single theme is defined by a determinant and the attributes it
determines.
Classes of Relations
- Increasing normal forms exhibit fewer anomalies.
- There is a procedure for getting from a normal form to next higher, except
for domain/key normal form (DK/NF).
- Domain/key normal form relations have no anomalies for non-temporal data.
- There is no fixed procedure to put a relation into DK/NF.
First thru fifth normal forms
- A relation is placed into a higher normal form by splitting it to
satisfy the following normal form definitions.
First normal form: any table of data that meets the definition of a
relation.
Second normal form (2NF)
- Anomalies can occur when attributes are dependent on only part of a
multi-attribute key.
- A relation is in second normal form when all non-key attributes are
dependent on the whole key.
- That is, no attribute is dependent on only a part of the key.
- Any relation having a key with a single attribute is in second normal
form.
Third normal form (3NF)
- Anomalies can occur when a relation contains one or more transitive
dependencies.
- A transitive dependency exists when A-->B-->C and NOT B-->A.
- A relation is in 3NF when it is in 2NF and has no transitive dependencies.
- A relation is in 3NF when "All non-key attributes are dependent on
the key, the whole key and nothing but the key".
Boyce-Codd Normal (BCNF)
- Anomalies can occur in relations in 3NF if there are determinants in the
relation that are not candidate keys.
- A relation is in BCNF if every determinant is a candidate key.
Fourth normal form (4NF)
- Multivalued dependencies can occur in relations in BCNF.
- A multivalued dependency (MVD) occurs when a single attribute, say A,
multidetermines two other independent attributes, say B and C. That is,
A-->>B and A-->>C.
- X-->>Y, ie X multidetermines Y, when for each value of X we can have
more than one value of Y.
- A relation is in 4NF when it is in BCNF and has no multivalued
dependencies.
Fifth normal form (5NF)
- Deals with obscure join dependencies.
- The consequences (if any) of such dependencies are not known.
Domain/Key normal form (DK/NF)
- A relation in DK/NF has no modification anomalies, and conversely.
- DK/NF is the ultimate normal form; there is no higher normal form related
to modification anomalies
- Defn: A relation is in DK/NF if every constraint on the relation is
a logical consequence of the definition of keys and domains.
- Constraint
is any rule governing static values of attributes
that is precise enough to be ascertained whether or not it is true
- E.g. edit rules, intra-relation and inter-relation constraints, functional
and multi-valued dependencies.
- Not including constraints on changes in data values or time-dependent
constraints.
- Key
- the unique identifier of a tuple.
- Domain
: physical and a logical description of an attributes allowed
values.
- Physical description is the format of an attribute.
- Logical description is a further restriction of the values the domain is
allowed
- Logical consequence:
find a constraint on keys and/or domains which,
if it is enforced, means that the desired constraint is also enforced.
- Bottom line on DK/NF: If every table has a single theme, then all
functional dependencies will be logical consequences of keys. All data value
constraints can them be expressed as domain constraints.
- Practical consequence: Since keys are enforced by the DBMS and domains are
enforced by edit checks on data input, all modification anomalies can be
avoided by just these two simple measures.
The synthesis of relations
- An alternative approach to designing relations that will not have
modification anomalies is to use functional dependencies to synthesize
relations.
- Assume no attribute belongs to any relation and join them into relations
according to their functional dependencies.
- Attributes that are related can be related in 1 of 3 ways: 1-to-1,
many-to-1, many-to-many.
- 1-to-1: A-->B and B-->A, so A and B must occur together in at least
one relation.
- Many-to-1: A-->B, but NOT B-->A, so A is the determinant and is the
key of the relation and B is a non-key attribute.
- Many-to-many: NOT A-->B and NOT B-->A, so the composite of both
attributes must be the key of the relation, i.e. an intersection table.
- Not practical for large numbers of attributes because there are too many
pairs of attributes to check.