+ All documents
Home > Documents > 3 THE RELATIONAL MODEL

3 THE RELATIONAL MODEL

Date post: 03-Apr-2023
Category:
Upload: khangminh22
View: 2 times
Download: 0 times
Share this document with a friend
38
3 THE RELATIONAL MODEL TABLE: An arrangement of words, numbers, or signs, or combinations of them, as in parallel columns, to exhibit a set of facts or relations in a definite, compact, and comprehensive form; a synopsis or scheme. Webster’s Dictionary of the English Language Codd proposed the relational data model in 1970. At that time most database systems were based on one of two older data models (the hierarchical model and the network model); the relational model revolutionized the database field and largely supplanted these earlier models. Prototype relational database management systems were devel- oped in pioneering research projects at IBM and UC-Berkeley by the mid-70s, and several vendors were offering relational database products shortly thereafter. Today, the relational model is by far the dominant data model and is the foundation for the leading DBMS products, including IBM’s DB2 family, Informix, Oracle, Sybase, Mi- crosoft’s Access and SQLServer, FoxBase, and Paradox. Relational database systems are ubiquitous in the marketplace and represent a multibillion dollar industry. The relational model is very simple and elegant; a database is a collection of one or more relations, where each relation is a table with rows and columns. This simple tabular representation enables even novice users to understand the contents of a database, and it permits the use of simple, high-level languages to query the data. The major advantages of the relational model over the older data models are its simple data representation and the ease with which even complex queries can be expressed. This chapter introduces the relational model and covers the following issues: How is data represented? What kinds of integrity constraints can be expressed? How can data be created and modified? How can data be manipulated and queried? How do we obtain a database design in the relational model? How are logical and physical data independence achieved? 51
Transcript

3THE RELATIONAL MODEL

TABLE: An arrangement of words, numbers, or signs, or combinations of them, as

in parallel columns, to exhibit a set of facts or relations in a definite, compact, and

comprehensive form; a synopsis or scheme.

—Webster’s Dictionary of the English Language

Codd proposed the relational data model in 1970. At that time most database systems

were based on one of two older data models (the hierarchical model and the network

model); the relational model revolutionized the database field and largely supplanted

these earlier models. Prototype relational database management systems were devel-

oped in pioneering research projects at IBM and UC-Berkeley by the mid-70s, and

several vendors were offering relational database products shortly thereafter. Today,

the relational model is by far the dominant data model and is the foundation for the

leading DBMS products, including IBM’s DB2 family, Informix, Oracle, Sybase, Mi-

crosoft’s Access and SQLServer, FoxBase, and Paradox. Relational database systems

are ubiquitous in the marketplace and represent a multibillion dollar industry.

The relational model is very simple and elegant; a database is a collection of one or more

relations, where each relation is a table with rows and columns. This simple tabular

representation enables even novice users to understand the contents of a database,

and it permits the use of simple, high-level languages to query the data. The major

advantages of the relational model over the older data models are its simple data

representation and the ease with which even complex queries can be expressed.

This chapter introduces the relational model and covers the following issues:

How is data represented?

What kinds of integrity constraints can be expressed?

How can data be created and modified?

How can data be manipulated and queried?

How do we obtain a database design in the relational model?

How are logical and physical data independence achieved?

51

52 Chapter 3

SQL: It was the query language of the pioneering System-R relational DBMS

developed at IBM. Over the years, SQL has become the most widely used language

for creating, manipulating, and querying relational DBMSs. Since many vendors

offer SQL products, there is a need for a standard that defines ‘official SQL.’

The existence of a standard allows users to measure a given vendor’s version of

SQL for completeness. It also allows users to distinguish SQL features that are

specific to one product from those that are standard; an application that relies on

non-standard features is less portable.

The first SQL standard was developed in 1986 by the American National Stan-

dards Institute (ANSI), and was called SQL-86. There was a minor revision in

1989 called SQL-89, and a major revision in 1992 called SQL-92. The Interna-

tional Standards Organization (ISO) collaborated with ANSI to develop SQL-92.

Most commercial DBMSs currently support SQL-92. An exciting development is

the imminent approval of SQL:1999, a major extension of SQL-92. While the cov-

erage of SQL in this book is based upon SQL-92, we will cover the main extensions

of SQL:1999 as well.

While we concentrate on the underlying concepts, we also introduce the Data Def-

inition Language (DDL) features of SQL-92, the standard language for creating,

manipulating, and querying data in a relational DBMS. This allows us to ground the

discussion firmly in terms of real database systems.

We discuss the concept of a relation in Section 3.1 and show how to create relations

using the SQL language. An important component of a data model is the set of

constructs it provides for specifying conditions that must be satisfied by the data. Such

conditions, called integrity constraints (ICs), enable the DBMS to reject operations that

might corrupt the data. We present integrity constraints in the relational model in

Section 3.2, along with a discussion of SQL support for ICs. We discuss how a DBMS

enforces integrity constraints in Section 3.3. In Section 3.4 we turn to the mechanism

for accessing and retrieving data from the database, query languages, and introduce

the querying features of SQL, which we examine in greater detail in a later chapter.

We then discuss the step of converting an ER diagram into a relational database schema

in Section 3.5. Finally, we introduce views, or tables defined using queries, in Section

3.6. Views can be used to define the external schema for a database and thus provide

the support for logical data independence in the relational model.

3.1 INTRODUCTION TO THE RELATIONAL MODEL

The main construct for representing data in the relational model is a relation. A

relation consists of a relation schema and a relation instance. The relation instance

The Relational Model 53

is a table, and the relation schema describes the column heads for the table. We first

describe the relation schema and then the relation instance. The schema specifies the

relation’s name, the name of each field (or column, or attribute), and the domain

of each field. A domain is referred to in a relation schema by the domain name and

has a set of associated values.

We use the example of student information in a university database from Chapter 1

to illustrate the parts of a relation schema:

Students(sid: string, name: string, login: string, age: integer, gpa: real)

This says, for instance, that the field named sid has a domain named string. The set

of values associated with domain string is the set of all character strings.

We now turn to the instances of a relation. An instance of a relation is a set of

tuples, also called records, in which each tuple has the same number of fields as the

relation schema. A relation instance can be thought of as a table in which each tuple

is a row, and all rows have the same number of fields. (The term relation instance is

often abbreviated to just relation, when there is no confusion with other aspects of a

relation such as its schema.)

An instance of the Students relation appears in Figure 3.1. The instance S1 contains

53831

53832

53650

53688

53666

50000 3.3

3.4

3.2

3.8

1.8

2.0

19

18

18

19

11

12

madayan@music

guldu@music

smith@math

smith@ee

jones@cs

dave@cs

Madayan

Guldu

Smith

Smith

Jones

Dave

sid age gpaloginname

TUPLES

(RECORDS, ROWS)

FIELDS (ATTRIBUTES, COLUMNS)

Field names

Figure 3.1 An Instance S1 of the Students Relation

six tuples and has, as we expect from the schema, five fields. Note that no two rows

are identical. This is a requirement of the relational model—each relation is defined

to be a set of unique tuples or rows.1 The order in which the rows are listed is not

important. Figure 3.2 shows the same relation instance. If the fields are named, as in

1In practice, commercial systems allow tables to have duplicate rows, but we will assume that arelation is indeed a set of tuples unless otherwise noted.

54 Chapter 3

sid name login age gpa

53831 Madayan madayan@music 11 1.8

53832 Guldu guldu@music 12 2.0

53688 Smith smith@ee 18 3.2

53650 Smith smith@math 19 3.8

53666 Jones jones@cs 18 3.4

50000 Dave dave@cs 19 3.3

Figure 3.2 An Alternative Representation of Instance S1 of Students

our schema definitions and figures depicting relation instances, the order of fields does

not matter either. However, an alternative convention is to list fields in a specific order

and to refer to a field by its position. Thus sid is field 1 of Students, login is field 3,

and so on. If this convention is used, the order of fields is significant. Most database

systems use a combination of these conventions. For example, in SQL the named fields

convention is used in statements that retrieve tuples, and the ordered fields convention

is commonly used when inserting tuples.

A relation schema specifies the domain of each field or column in the relation instance.

These domain constraints in the schema specify an important condition that we

want each instance of the relation to satisfy: The values that appear in a column must

be drawn from the domain associated with that column. Thus, the domain of a field

is essentially the type of that field, in programming language terms, and restricts the

values that can appear in the field.

More formally, let R(f1:D1, . . ., fn:Dn) be a relation schema, and for each fi, 1 ≤ i ≤ n,

let Domi be the set of values associated with the domain named Di. An instance of R

that satisfies the domain constraints in the schema is a set of tuples with n fields:

{ 〈f1 : d1, . . . , fn : dn〉 | d1 ∈ Dom1, . . . , dn ∈ Domn }

The angular brackets 〈. . .〉 identify the fields of a tuple. Using this notation, the first

Students tuple shown in Figure 3.1 is written as 〈sid: 50000, name: Dave, login:

dave@cs, age: 19, gpa: 3.3〉. The curly brackets {. . .} denote a set (of tuples, in this

definition). The vertical bar | should be read ‘such that,’ the symbol ∈ should be read

‘in,’ and the expression to the right of the vertical bar is a condition that must be

satisfied by the field values of each tuple in the set. Thus, an instance of R is defined

as a set of tuples. The fields of each tuple must correspond to the fields in the relation

schema.

Domain constraints are so fundamental in the relational model that we will henceforth

consider only relation instances that satisfy them; therefore, relation instance means

relation instance that satisfies the domain constraints in the relation schema.

The Relational Model 55

The degree, also called arity, of a relation is the number of fields. The cardinality

of a relation instance is the number of tuples in it. In Figure 3.1, the degree of the

relation (the number of columns) is five, and the cardinality of this instance is six.

A relational database is a collection of relations with distinct relation names. The

relational database schema is the collection of schemas for the relations in the

database. For example, in Chapter 1, we discussed a university database with rela-

tions called Students, Faculty, Courses, Rooms, Enrolled, Teaches, and Meets In. An

instance of a relational database is a collection of relation instances, one per rela-

tion schema in the database schema; of course, each relation instance must satisfy the

domain constraints in its schema.

3.1.1 Creating and Modifying Relations Using SQL-92

The SQL-92 language standard uses the word table to denote relation, and we will

often follow this convention when discussing SQL. The subset of SQL that supports

the creation, deletion, and modification of tables is called the Data Definition Lan-

guage (DDL). Further, while there is a command that lets users define new domains,

analogous to type definition commands in a programming language, we postpone a dis-

cussion of domain definition until Section 5.11. For now, we will just consider domains

that are built-in types, such as integer.

The CREATE TABLE statement is used to define a new table.2 To create the Students

relation, we can use the following statement:

CREATE TABLE Students ( sid CHAR(20),

name CHAR(30),

login CHAR(20),

age INTEGER,

gpa REAL )

Tuples are inserted using the INSERT command. We can insert a single tuple into the

Students table as follows:

INSERT

INTO Students (sid, name, login, age, gpa)

VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)

We can optionally omit the list of column names in the INTO clause and list the values

in the appropriate order, but it is good style to be explicit about column names.

2SQL also provides statements to destroy tables and to change the columns associated with a table;we discuss these in Section 3.7.

56 Chapter 3

We can delete tuples using the DELETE command. We can delete all Students tuples

with name equal to Smith using the command:

DELETE

FROM Students S

WHERE S.name = ‘Smith’

We can modify the column values in an existing row using the UPDATE command. For

example, we can increment the age and decrement the gpa of the student with sid

53688:

UPDATE Students S

SET S.age = S.age + 1, S.gpa = S.gpa - 1

WHERE S.sid = 53688

These examples illustrate some important points. The WHERE clause is applied first

and determines which rows are to be modified. The SET clause then determines how

these rows are to be modified. If the column that is being modified is also used to

determine the new value, the value used in the expression on the right side of equals

(=) is the old value, that is, before the modification. To illustrate these points further,

consider the following variation of the previous query:

UPDATE Students S

SET S.gpa = S.gpa - 0.1

WHERE S.gpa >= 3.3

If this query is applied on the instance S1 of Students shown in Figure 3.1, we obtain

the instance shown in Figure 3.3.

sid name login age gpa

50000 Dave dave@cs 19 3.2

53666 Jones jones@cs 18 3.3

53688 Smith smith@ee 18 3.2

53650 Smith smith@math 19 3.7

53831 Madayan madayan@music 11 1.8

53832 Guldu guldu@music 12 2.0

Figure 3.3 Students Instance S1 after Update

3.2 INTEGRITY CONSTRAINTS OVER RELATIONS

A database is only as good as the information stored in it, and a DBMS must therefore

help prevent the entry of incorrect information. An integrity constraint (IC) is a

The Relational Model 57

condition that is specified on a database schema, and restricts the data that can be

stored in an instance of the database. If a database instance satisfies all the integrity

constraints specified on the database schema, it is a legal instance. A DBMS enforces

integrity constraints, in that it permits only legal instances to be stored in the database.

Integrity constraints are specified and enforced at different times:

1. When the DBA or end user defines a database schema, he or she specifies the ICs

that must hold on any instance of this database.

2. When a database application is run, the DBMS checks for violations and disallows

changes to the data that violate the specified ICs. (In some situations, rather than

disallow the change, the DBMS might instead make some compensating changes

to the data to ensure that the database instance satisfies all ICs. In any case,

changes to the database are not allowed to create an instance that violates any

IC.)

Many kinds of integrity constraints can be specified in the relational model. We have

already seen one example of an integrity constraint in the domain constraints associated

with a relation schema (Section 3.1). In general, other kinds of constraints can be

specified as well; for example, no two students have the same sid value. In this section

we discuss the integrity constraints, other than domain constraints, that a DBA or

user can specify in the relational model.

3.2.1 Key Constraints

Consider the Students relation and the constraint that no two students have the same

student id. This IC is an example of a key constraint. A key constraint is a statement

that a certain minimal subset of the fields of a relation is a unique identifier for a tuple.

A set of fields that uniquely identifies a tuple according to a key constraint is called

a candidate key for the relation; we often abbreviate this to just key. In the case of

the Students relation, the (set of fields containing just the) sid field is a candidate key.

Let us take a closer look at the above definition of a (candidate) key. There are two

parts to the definition:3

1. Two distinct tuples in a legal instance (an instance that satisfies all ICs, including

the key constraint) cannot have identical values in all the fields of a key.

2. No subset of the set of fields in a key is a unique identifier for a tuple.

3The term key is rather overworked. In the context of access methods, we speak of search keys,which are quite different.

58 Chapter 3

The first part of the definition means that in any legal instance, the values in the key

fields uniquely identify a tuple in the instance. When specifying a key constraint, the

DBA or user must be sure that this constraint will not prevent them from storing a

‘correct’ set of tuples. (A similar comment applies to the specification of other kinds

of ICs as well.) The notion of ‘correctness’ here depends upon the nature of the data

being stored. For example, several students may have the same name, although each

student has a unique student id. If the name field is declared to be a key, the DBMS

will not allow the Students relation to contain two tuples describing different students

with the same name!

The second part of the definition means, for example, that the set of fields {sid, name}

is not a key for Students, because this set properly contains the key {sid}. The set

{sid, name} is an example of a superkey, which is a set of fields that contains a key.

Look again at the instance of the Students relation in Figure 3.1. Observe that two

different rows always have different sid values; sid is a key and uniquely identifies a

tuple. However, this does not hold for nonkey fields. For example, the relation contains

two rows with Smith in the name field.

Note that every relation is guaranteed to have a key. Since a relation is a set of tuples,

the set of all fields is always a superkey. If other constraints hold, some subset of the

fields may form a key, but if not, the set of all fields is a key.

A relation may have several candidate keys. For example, the login and age fields of

the Students relation may, taken together, also identify students uniquely. That is,

{login, age} is also a key. It may seem that login is a key, since no two rows in the

example instance have the same login value. However, the key must identify tuples

uniquely in all possible legal instances of the relation. By stating that {login, age} is

a key, the user is declaring that two students may have the same login or age, but not

both.

Out of all the available candidate keys, a database designer can identify a primary

key. Intuitively, a tuple can be referred to from elsewhere in the database by storing

the values of its primary key fields. For example, we can refer to a Students tuple by

storing its sid value. As a consequence of referring to student tuples in this manner,

tuples are frequently accessed by specifying their sid value. In principle, we can use

any key, not just the primary key, to refer to a tuple. However, using the primary key is

preferable because it is what the DBMS expects—this is the significance of designating

a particular candidate key as a primary key—and optimizes for. For example, the

DBMS may create an index with the primary key fields as the search key, to make

the retrieval of a tuple given its primary key value efficient. The idea of referring to a

tuple is developed further in the next section.

The Relational Model 59

Specifying Key Constraints in SQL-92

In SQL we can declare that a subset of the columns of a table constitute a key by

using the UNIQUE constraint. At most one of these ‘candidate’ keys can be declared

to be a primary key, using the PRIMARY KEY constraint. (SQL does not require that

such constraints be declared for a table.)

Let us revisit our example table definition and specify key information:

CREATE TABLE Students ( sid CHAR(20),

name CHAR(30),

login CHAR(20),

age INTEGER,

gpa REAL,

UNIQUE (name, age),

CONSTRAINT StudentsKey PRIMARY KEY (sid) )

This definition says that sid is the primary key and that the combination of name and

age is also a key. The definition of the primary key also illustrates how we can name

a constraint by preceding it with CONSTRAINT constraint-name. If the constraint is

violated, the constraint name is returned and can be used to identify the error.

3.2.2 Foreign Key Constraints

Sometimes the information stored in a relation is linked to the information stored in

another relation. If one of the relations is modified, the other must be checked, and

perhaps modified, to keep the data consistent. An IC involving both relations must

be specified if a DBMS is to make such checks. The most common IC involving two

relations is a foreign key constraint.

Suppose that in addition to Students, we have a second relation:

Enrolled(sid: string, cid: string, grade: string)

To ensure that only bona fide students can enroll in courses, any value that appears in

the sid field of an instance of the Enrolled relation should also appear in the sid field

of some tuple in the Students relation. The sid field of Enrolled is called a foreign

key and refers to Students. The foreign key in the referencing relation (Enrolled, in

our example) must match the primary key of the referenced relation (Students), i.e.,

it must have the same number of columns and compatible data types, although the

column names can be different.

This constraint is illustrated in Figure 3.4. As the figure shows, there may well be

some students who are not referenced from Enrolled (e.g., the student with sid=50000).

60 Chapter 3

However, every sid value that appears in the instance of the Enrolled table appears in

the primary key column of a row in the Students table.

cid grade sid

53831

53832

53650

53688

53666

50000Carnatic101

Reggae203

Topology112

History105

53831

53832

53650

53666

3.3

3.4

3.2

3.8

1.8

2.0

19

18

18

19

11

12

madayan@music

guldu@music

smith@math

smith@ee

jones@cs

dave@cs

Madayan

Guldu

Smith

Smith

Jones

DaveC

B

A

B

Enrolled (Referencing relation) (Referenced relation)Students

Primary keyForeign key

sid age gpaloginname

Figure 3.4 Referential Integrity

If we try to insert the tuple 〈55555, Art104, A〉 into E1, the IC is violated because

there is no tuple in S1 with the id 55555; the database system should reject such

an insertion. Similarly, if we delete the tuple 〈53666, Jones, jones@cs, 18, 3.4〉 from

S1, we violate the foreign key constraint because the tuple 〈53666, History105, B〉

in E1 contains sid value 53666, the sid of the deleted Students tuple. The DBMS

should disallow the deletion or, perhaps, also delete the Enrolled tuple that refers to

the deleted Students tuple. We discuss foreign key constraints and their impact on

updates in Section 3.3.

Finally, we note that a foreign key could refer to the same relation. For example,

we could extend the Students relation with a column called partner and declare this

column to be a foreign key referring to Students. Intuitively, every student could then

have a partner, and the partner field contains the partner’s sid. The observant reader

will no doubt ask, “What if a student does not (yet) have a partner?” This situation

is handled in SQL by using a special value called null. The use of null in a field of a

tuple means that value in that field is either unknown or not applicable (e.g., we do not

know the partner yet, or there is no partner). The appearance of null in a foreign key

field does not violate the foreign key constraint. However, null values are not allowed

to appear in a primary key field (because the primary key fields are used to identify a

tuple uniquely). We will discuss null values further in Chapter 5.

Specifying Foreign Key Constraints in SQL-92

Let us define Enrolled(sid: string, cid: string, grade: string):

CREATE TABLE Enrolled ( sid CHAR(20),

The Relational Model 61

cid CHAR(20),

grade CHAR(10),

PRIMARY KEY (sid, cid),

FOREIGN KEY (sid) REFERENCES Students )

The foreign key constraint states that every sid value in Enrolled must also appear in

Students, that is, sid in Enrolled is a foreign key referencing Students. Incidentally,

the primary key constraint states that a student has exactly one grade for each course

that he or she is enrolled in. If we want to record more than one grade per student

per course, we should change the primary key constraint.

3.2.3 General Constraints

Domain, primary key, and foreign key constraints are considered to be a fundamental

part of the relational data model and are given special attention in most commercial

systems. Sometimes, however, it is necessary to specify more general constraints.

For example, we may require that student ages be within a certain range of values;

given such an IC specification, the DBMS will reject inserts and updates that violate

the constraint. This is very useful in preventing data entry errors. If we specify that

all students must be at least 16 years old, the instance of Students shown in Figure

3.1 is illegal because two students are underage. If we disallow the insertion of these

two tuples, we have a legal instance, as shown in Figure 3.5.

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

53650 Smith smith@math 19 3.8

Figure 3.5 An Instance S2 of the Students Relation

The IC that students must be older than 16 can be thought of as an extended domain

constraint, since we are essentially defining the set of permissible age values more strin-

gently than is possible by simply using a standard domain such as integer. In general,

however, constraints that go well beyond domain, key, or foreign key constraints can

be specified. For example, we could require that every student whose age is greater

than 18 must have a gpa greater than 3.

Current relational database systems support such general constraints in the form of

table constraints and assertions. Table constraints are associated with a single table

and are checked whenever that table is modified. In contrast, assertions involve several

62 Chapter 3

tables and are checked whenever any of these tables is modified. Both table constraints

and assertions can use the full power of SQL queries to specify the desired restriction.

We discuss SQL support for table constraints and assertions in Section 5.11 because a

full appreciation of their power requires a good grasp of SQL’s query capabilities.

3.3 ENFORCING INTEGRITY CONSTRAINTS

As we observed earlier, ICs are specified when a relation is created and enforced when

a relation is modified. The impact of domain, PRIMARY KEY, and UNIQUE constraints

is straightforward: if an insert, delete, or update command causes a violation, it is

rejected. Potential IC violation is generally checked at the end of each SQL statement

execution, although it can be deferred until the end of the transaction executing the

statement, as we will see in Chapter 18.

Consider the instance S1 of Students shown in Figure 3.1. The following insertion

violates the primary key constraint because there is already a tuple with the sid 53688,

and it will be rejected by the DBMS:

INSERT

INTO Students (sid, name, login, age, gpa)

VALUES (53688, ‘Mike’, ‘mike@ee’, 17, 3.4)

The following insertion violates the constraint that the primary key cannot contain

null:

INSERT

INTO Students (sid, name, login, age, gpa)

VALUES (null, ‘Mike’, ‘mike@ee’, 17, 3.4)

Of course, a similar problem arises whenever we try to insert a tuple with a value in

a field that is not in the domain associated with that field, i.e., whenever we violate

a domain constraint. Deletion does not cause a violation of domain, primary key or

unique constraints. However, an update can cause violations, similar to an insertion:

UPDATE Students S

SET S.sid = 50000

WHERE S.sid = 53688

This update violates the primary key constraint because there is already a tuple with

sid 50000.

The impact of foreign key constraints is more complex because SQL sometimes tries to

rectify a foreign key constraint violation instead of simply rejecting the change. We will

The Relational Model 63

discuss the referential integrity enforcement steps taken by the DBMS in terms

of our Enrolled and Students tables, with the foreign key constraint that Enrolled.sid

is a reference to (the primary key of) Students.

In addition to the instance S1 of Students, consider the instance of Enrolled shown

in Figure 3.4. Deletions of Enrolled tuples do not violate referential integrity, but

insertions of Enrolled tuples could. The following insertion is illegal because there is

no student with sid 51111:

INSERT

INTO Enrolled (cid, grade, sid)

VALUES (‘Hindi101’, ‘B’, 51111)

On the other hand, insertions of Students tuples do not violate referential integrity

although deletions could. Further, updates on either Enrolled or Students that change

the sid value could potentially violate referential integrity.

SQL-92 provides several alternative ways to handle foreign key violations. We must

consider three basic questions:

1. What should we do if an Enrolled row is inserted, with a sid column value that

does not appear in any row of the Students table?

In this case the INSERT command is simply rejected.

2. What should we do if a Students row is deleted?

The options are:

Delete all Enrolled rows that refer to the deleted Students row.

Disallow the deletion of the Students row if an Enrolled row refers to it.

Set the sid column to the sid of some (existing) ‘default’ student, for every

Enrolled row that refers to the deleted Students row.

For every Enrolled row that refers to it, set the sid column to null. In our

example, this option conflicts with the fact that sid is part of the primary

key of Enrolled and therefore cannot be set to null. Thus, we are limited to

the first three options in our example, although this fourth option (setting

the foreign key to null) is available in the general case.

3. What should we do if the primary key value of a Students row is updated?

The options here are similar to the previous case.

SQL-92 allows us to choose any of the four options on DELETE and UPDATE. For exam-

ple, we can specify that when a Students row is deleted, all Enrolled rows that refer to

it are to be deleted as well, but that when the sid column of a Students row is modified,

this update is to be rejected if an Enrolled row refers to the modified Students row:

64 Chapter 3

CREATE TABLE Enrolled ( sid CHAR(20),

cid CHAR(20),

grade CHAR(10),

PRIMARY KEY (sid, cid),

FOREIGN KEY (sid) REFERENCES Students

ON DELETE CASCADE

ON UPDATE NO ACTION )

The options are specified as part of the foreign key declaration. The default option is

NO ACTION, which means that the action (DELETE or UPDATE) is to be rejected. Thus,

the ON UPDATE clause in our example could be omitted, with the same effect. The

CASCADE keyword says that if a Students row is deleted, all Enrolled rows that refer

to it are to be deleted as well. If the UPDATE clause specified CASCADE, and the sid

column of a Students row is updated, this update is also carried out in each Enrolled

row that refers to the updated Students row.

If a Students row is deleted, we can switch the enrollment to a ‘default’ student by using

ON DELETE SET DEFAULT. The default student is specified as part of the definition of

the sid field in Enrolled; for example, sid CHAR(20) DEFAULT ‘53666’. Although the

specification of a default value is appropriate in some situations (e.g., a default parts

supplier if a particular supplier goes out of business), it is really not appropriate to

switch enrollments to a default student. The correct solution in this example is to also

delete all enrollment tuples for the deleted student (that is, CASCADE), or to reject the

update.

SQL also allows the use of null as the default value by specifying ON DELETE SET NULL.

3.4 QUERYING RELATIONAL DATA

A relational database query (query, for short) is a question about the data, and the

answer consists of a new relation containing the result. For example, we might want

to find all students younger than 18 or all students enrolled in Reggae203. A query

language is a specialized language for writing queries.

SQL is the most popular commercial query language for a relational DBMS. We now

present some SQL examples that illustrate how easily relations can be queried. Con-

sider the instance of the Students relation shown in Figure 3.1. We can retrieve rows

corresponding to students who are younger than 18 with the following SQL query:

SELECT *

FROM Students S

WHERE S.age < 18

The Relational Model 65

The symbol * means that we retain all fields of selected tuples in the result. To

understand this query, think of S as a variable that takes on the value of each tuple

in Students, one tuple after the other. The condition S.age < 18 in the WHERE clause

specifies that we want to select only tuples in which the age field has a value less than

18. This query evaluates to the relation shown in Figure 3.6.

sid name login age gpa

53831 Madayan madayan@music 11 1.8

53832 Guldu guldu@music 12 2.0

Figure 3.6 Students with age < 18 on Instance S1

This example illustrates that the domain of a field restricts the operations that are

permitted on field values, in addition to restricting the values that can appear in the

field. The condition S.age < 18 involves an arithmetic comparison of an age value with

an integer and is permissible because the domain of age is the set of integers. On the

other hand, a condition such as S.age = S.sid does not make sense because it compares

an integer value with a string value, and this comparison is defined to fail in SQL; a

query containing this condition will produce no answer tuples.

In addition to selecting a subset of tuples, a query can extract a subset of the fields

of each selected tuple. We can compute the names and logins of students who are

younger than 18 with the following query:

SELECT S.name, S.login

FROM Students S

WHERE S.age < 18

Figure 3.7 shows the answer to this query; it is obtained by applying the selection

to the instance S1 of Students (to get the relation shown in Figure 3.6), followed by

removing unwanted fields. Note that the order in which we perform these operations

does matter—if we remove unwanted fields first, we cannot check the condition S.age

< 18, which involves one of those fields.

We can also combine information in the Students and Enrolled relations. If we want to

obtain the names of all students who obtained an A and the id of the course in which

they got an A, we could write the following query:

SELECT S.name, E.cid

FROM Students S, Enrolled E

WHERE S.sid = E.sid AND E.grade = ‘A’

66 Chapter 3

DISTINCT types in SQL: A comparison of two values drawn from different do-

mains should fail, even if the values are ‘compatible’ in the sense that both are

numeric or both are string values etc. For example, if salary and age are two dif-

ferent domains whose values are represented as integers, a comparison of a salary

value with an age value should fail. Unfortunately, SQL-92’s support for the con-

cept of domains does not go this far: We are forced to define salary and age as

integer types and the comparison S < A will succeed when S is bound to the

salary value 25 and A is bound to the age value 50. The latest version of the SQL

standard, called SQL:1999, addresses this problem, and allows us to define salary

and age as DISTINCT types even though their values are represented as integers.

Many systems, e.g., Informix UDS and IBM DB2, already support this feature.

name login

Madayan madayan@music

Guldu guldu@music

Figure 3.7 Names and Logins of Students under 18

This query can be understood as follows: “If there is a Students tuple S and an Enrolled

tuple E such that S.sid = E.sid (so that S describes the student who is enrolled in E)

and E.grade = ‘A’, then print the student’s name and the course id.” When evaluated

on the instances of Students and Enrolled in Figure 3.4, this query returns a single

tuple, 〈Smith, Topology112〉.

We will cover relational queries, and SQL in particular, in more detail in subsequent

chapters.

3.5 LOGICAL DATABASE DESIGN: ER TO RELATIONAL

The ER model is convenient for representing an initial, high-level database design.

Given an ER diagram describing a database, there is a standard approach to generating

a relational database schema that closely approximates the ER design. (The translation

is approximate to the extent that we cannot capture all the constraints implicit in the

ER design using SQL-92, unless we use certain SQL-92 constraints that are costly to

check.) We now describe how to translate an ER diagram into a collection of tables

with associated constraints, i.e., a relational database schema.

The Relational Model 67

3.5.1 Entity Sets to Tables

An entity set is mapped to a relation in a straightforward way: Each attribute of the

entity set becomes an attribute of the table. Note that we know both the domain of

each attribute and the (primary) key of an entity set.

Consider the Employees entity set with attributes ssn, name, and lot shown in Figure

3.8. A possible instance of the Employees entity set, containing three Employees

Employees

ssn

name

lot

Figure 3.8 The Employees Entity Set

entities, is shown in Figure 3.9 in a tabular format.

ssn name lot

123-22-3666 Attishoo 48

231-31-5368 Smiley 22

131-24-3650 Smethurst 35

Figure 3.9 An Instance of the Employees Entity Set

The following SQL statement captures the preceding information, including the domain

constraints and key information:

CREATE TABLE Employees ( ssn CHAR(11),

name CHAR(30),

lot INTEGER,

PRIMARY KEY (ssn) )

3.5.2 Relationship Sets (without Constraints) to Tables

A relationship set, like an entity set, is mapped to a relation in the relational model.

We begin by considering relationship sets without key and participation constraints,

and we discuss how to handle such constraints in subsequent sections. To represent

a relationship, we must be able to identify each participating entity and give values

68 Chapter 3

to the descriptive attributes of the relationship. Thus, the attributes of the relation

include:

The primary key attributes of each participating entity set, as foreign key fields.

The descriptive attributes of the relationship set.

The set of nondescriptive attributes is a superkey for the relation. If there are no key

constraints (see Section 2.4.1), this set of attributes is a candidate key.

Consider the Works In2 relationship set shown in Figure 3.10. Each department has

offices in several locations and we want to record the locations at which each employee

works.

dname

budgetdid

since

name

Employees

ssn lot

Locations

Departments

capacityaddress

Works_In2

Figure 3.10 A Ternary Relationship Set

All the available information about the Works In2 table is captured by the following

SQL definition:

CREATE TABLE Works In2 ( ssn CHAR(11),

did INTEGER,

address CHAR(20),

since DATE,

PRIMARY KEY (ssn, did, address),

FOREIGN KEY (ssn) REFERENCES Employees,

FOREIGN KEY (address) REFERENCES Locations,

FOREIGN KEY (did) REFERENCES Departments )

Note that the address, did, and ssn fields cannot take on null values. Because these

fields are part of the primary key for Works In2, a NOT NULL constraint is implicit

for each of these fields. This constraint ensures that these fields uniquely identify

a department, an employee, and a location in each tuple of Works In. We can also

The Relational Model 69

specify that a particular action is desired when a referenced Employees, Departments

or Locations tuple is deleted, as explained in the discussion of integrity constraints in

Section 3.2. In this chapter we assume that the default action is appropriate except

for situations in which the semantics of the ER diagram require some other action.

Finally, consider the Reports To relationship set shown in Figure 3.11. The role in-

Reports_To

name

Employees

subordinatesupervisor

ssn lot

Figure 3.11 The Reports To Relationship Set

dicators supervisor and subordinate are used to create meaningful field names in the

CREATE statement for the Reports To table:

CREATE TABLE Reports To (

supervisor ssn CHAR(11),

subordinate ssn CHAR(11),

PRIMARY KEY (supervisor ssn, subordinate ssn),

FOREIGN KEY (supervisor ssn) REFERENCES Employees(ssn),

FOREIGN KEY (subordinate ssn) REFERENCES Employees(ssn) )

Observe that we need to explicitly name the referenced field of Employees because the

field name differs from the name(s) of the referring field(s).

3.5.3 Translating Relationship Sets with Key Constraints

If a relationship set involves n entity sets and some m of them are linked via arrows

in the ER diagram, the key for any one of these m entity sets constitutes a key for

the relation to which the relationship set is mapped. Thus we have m candidate keys,

and one of these should be designated as the primary key. The translation discussed

in Section 2.3 from relationship sets to a relation can be used in the presence of key

constraints, taking into account this point about keys.

70 Chapter 3

Consider the relationship set Manages shown in Figure 3.12. The table corresponding

name dname

budgetdid

since

ManagesEmployees Departments

ssn lot

Figure 3.12 Key Constraint on Manages

to Manages has the attributes ssn, did, since. However, because each department has

at most one manager, no two tuples can have the same did value but differ on the ssn

value. A consequence of this observation is that did is itself a key for Manages; indeed,

the set did, ssn is not a key (because it is not minimal). The Manages relation can be

defined using the following SQL statement:

CREATE TABLE Manages ( ssn CHAR(11),

did INTEGER,

since DATE,

PRIMARY KEY (did),

FOREIGN KEY (ssn) REFERENCES Employees,

FOREIGN KEY (did) REFERENCES Departments )

A second approach to translating a relationship set with key constraints is often su-

perior because it avoids creating a distinct table for the relationship set. The idea

is to include the information about the relationship set in the table corresponding to

the entity set with the key, taking advantage of the key constraint. In the Manages

example, because a department has at most one manager, we can add the key fields of

the Employees tuple denoting the manager and the since attribute to the Departments

tuple.

This approach eliminates the need for a separate Manages relation, and queries asking

for a department’s manager can be answered without combining information from two

relations. The only drawback to this approach is that space could be wasted if several

departments have no managers. In this case the added fields would have to be filled

with null values. The first translation (using a separate table for Manages) avoids this

inefficiency, but some important queries require us to combine information from two

relations, which can be a slow operation.

The following SQL statement, defining a Dept Mgr relation that captures the informa-

tion in both Departments and Manages, illustrates the second approach to translating

relationship sets with key constraints:

The Relational Model 71

CREATE TABLE Dept Mgr ( did INTEGER,

dname CHAR(20),

budget REAL,

ssn CHAR(11),

since DATE,

PRIMARY KEY (did),

FOREIGN KEY (ssn) REFERENCES Employees )

Note that ssn can take on null values.

This idea can be extended to deal with relationship sets involving more than two entity

sets. In general, if a relationship set involves n entity sets and some m of them are

linked via arrows in the ER diagram, the relation corresponding to any one of the m

sets can be augmented to capture the relationship.

We discuss the relative merits of the two translation approaches further after consid-

ering how to translate relationship sets with participation constraints into tables.

3.5.4 Translating Relationship Sets with Participation Constraints

Consider the ER diagram in Figure 3.13, which shows two relationship sets, Manages

and Works In.

name dname

budgetdid

since

Manages

name dname

budgetdid

since

Manages

since

DepartmentsEmployees

ssn

Works_In

lot

Figure 3.13 Manages and Works In

72 Chapter 3

Every department is required to have a manager, due to the participation constraint,

and at most one manager, due to the key constraint. The following SQL statement

reflects the second translation approach discussed in Section 3.5.3, and uses the key

constraint:

CREATE TABLE Dept Mgr ( did INTEGER,

dname CHAR(20),

budget REAL,

ssn CHAR(11) NOT NULL,

since DATE,

PRIMARY KEY (did),

FOREIGN KEY (ssn) REFERENCES Employees

ON DELETE NO ACTION )

It also captures the participation constraint that every department must have a man-

ager: Because ssn cannot take on null values, each tuple of Dept Mgr identifies a tuple

in Employees (who is the manager). The NO ACTION specification, which is the default

and need not be explicitly specified, ensures that an Employees tuple cannot be deleted

while it is pointed to by a Dept Mgr tuple. If we wish to delete such an Employees

tuple, we must first change the Dept Mgr tuple to have a new employee as manager.

(We could have specified CASCADE instead of NO ACTION, but deleting all information

about a department just because its manager has been fired seems a bit extreme!)

The constraint that every department must have a manager cannot be captured using

the first translation approach discussed in Section 3.5.3. (Look at the definition of

Manages and think about what effect it would have if we added NOT NULL constraints

to the ssn and did fields. Hint: The constraint would prevent the firing of a manager,

but does not ensure that a manager is initially appointed for each department!) This

situation is a strong argument in favor of using the second approach for one-to-many

relationships such as Manages, especially when the entity set with the key constraint

also has a total participation constraint.

Unfortunately, there are many participation constraints that we cannot capture using

SQL-92, short of using table constraints or assertions. Table constraints and assertions

can be specified using the full power of the SQL query language (as discussed in

Section 5.11) and are very expressive, but also very expensive to check and enforce.

For example, we cannot enforce the participation constraints on the Works In relation

without using these general constraints. To see why, consider the Works In relation

obtained by translating the ER diagram into relations. It contains fields ssn and

did, which are foreign keys referring to Employees and Departments. To ensure total

participation of Departments in Works In, we have to guarantee that every did value in

Departments appears in a tuple of Works In. We could try to guarantee this condition

by declaring that did in Departments is a foreign key referring to Works In, but this

is not a valid foreign key constraint because did is not a candidate key for Works In.

The Relational Model 73

To ensure total participation of Departments in Works In using SQL-92, we need an

assertion. We have to guarantee that every did value in Departments appears in a

tuple of Works In; further, this tuple of Works In must also have non null values in

the fields that are foreign keys referencing other entity sets involved in the relationship

(in this example, the ssn field). We can ensure the second part of this constraint by

imposing the stronger requirement that ssn in Works In cannot contain null values.

(Ensuring that the participation of Employees in Works In is total is symmetric.)

Another constraint that requires assertions to express in SQL is the requirement that

each Employees entity (in the context of the Manages relationship set) must manage

at least one department.

In fact, the Manages relationship set exemplifies most of the participation constraints

that we can capture using key and foreign key constraints. Manages is a binary rela-

tionship set in which exactly one of the entity sets (Departments) has a key constraint,

and the total participation constraint is expressed on that entity set.

We can also capture participation constraints using key and foreign key constraints in

one other special situation: a relationship set in which all participating entity sets have

key constraints and total participation. The best translation approach in this case is

to map all the entities as well as the relationship into a single table; the details are

straightforward.

3.5.5 Translating Weak Entity Sets

A weak entity set always participates in a one-to-many binary relationship and has a

key constraint and total participation. The second translation approach discussed in

Section 3.5.3 is ideal in this case, but we must take into account the fact that the weak

entity has only a partial key. Also, when an owner entity is deleted, we want all owned

weak entities to be deleted.

Consider the Dependents weak entity set shown in Figure 3.14, with partial key pname.

A Dependents entity can be identified uniquely only if we take the key of the owning

Employees entity and the pname of the Dependents entity, and the Dependents entity

must be deleted if the owning Employees entity is deleted.

We can capture the desired semantics with the following definition of the Dep Policy

relation:

CREATE TABLE Dep Policy ( pname CHAR(20),

age INTEGER,

cost REAL,

ssn CHAR(11),

74 Chapter 3

name

agepname

DependentsEmployees

ssn

Policy

costlot

Figure 3.14 The Dependents Weak Entity Set

PRIMARY KEY (pname, ssn),

FOREIGN KEY (ssn) REFERENCES Employees

ON DELETE CASCADE )

Observe that the primary key is 〈pname, ssn〉, since Dependents is a weak entity. This

constraint is a change with respect to the translation discussed in Section 3.5.3. We

have to ensure that every Dependents entity is associated with an Employees entity

(the owner), as per the total participation constraint on Dependents. That is, ssn

cannot be null. This is ensured because ssn is part of the primary key. The CASCADE

option ensures that information about an employee’s policy and dependents is deleted

if the corresponding Employees tuple is deleted.

3.5.6 Translating Class Hierarchies

We present the two basic approaches to handling ISA hierarchies by applying them to

the ER diagram shown in Figure 3.15:

name

ISA

ssn

EmployeeEmployees

Hourly_Emps Contract_Emps

lot

contractidhours_worked

hourly_wages

Figure 3.15 Class Hierarchy

The Relational Model 75

1. We can map each of the entity sets Employees, Hourly Emps, and Contract Emps

to a distinct relation. The Employees relation is created as in Section 2.2. We

discuss Hourly Emps here; Contract Emps is handled similarly. The relation for

Hourly Emps includes the hourly wages and hours worked attributes of Hourly Emps.

It also contains the key attributes of the superclass (ssn, in this example), which

serve as the primary key for Hourly Emps, as well as a foreign key referencing

the superclass (Employees). For each Hourly Emps entity, the value of the name

and lot attributes are stored in the corresponding row of the superclass (Employ-

ees). Note that if the superclass tuple is deleted, the delete must be cascaded to

Hourly Emps.

2. Alternatively, we can create just two relations, corresponding to Hourly Emps

and Contract Emps. The relation for Hourly Emps includes all the attributes

of Hourly Emps as well as all the attributes of Employees (i.e., ssn, name, lot,

hourly wages, hours worked).

The first approach is general and is always applicable. Queries in which we want to

examine all employees and do not care about the attributes specific to the subclasses

are handled easily using the Employees relation. However, queries in which we want

to examine, say, hourly employees, may require us to combine Hourly Emps (or Con-

tract Emps, as the case may be) with Employees to retrieve name and lot.

The second approach is not applicable if we have employees who are neither hourly

employees nor contract employees, since there is no way to store such employees. Also,

if an employee is both an Hourly Emps and a Contract Emps entity, then the name

and lot values are stored twice. This duplication can lead to some of the anomalies

that we discuss in Chapter 15. A query that needs to examine all employees must now

examine two relations. On the other hand, a query that needs to examine only hourly

employees can now do so by examining just one relation. The choice between these

approaches clearly depends on the semantics of the data and the frequency of common

operations.

In general, overlap and covering constraints can be expressed in SQL-92 only by using

assertions.

3.5.7 Translating ER Diagrams with Aggregation

Translating aggregation into the relational model is easy because there is no real dis-

tinction between entities and relationships in the relational model.

Consider the ER diagram shown in Figure 3.16. The Employees, Projects, and De-

partments entity sets and the Sponsors relationship set are mapped as described in

previous sections. For the Monitors relationship set, we create a relation with the

following attributes: the key attributes of Employees (ssn), the key attributes of Spon-

76 Chapter 3

until

since

name

budgetdidpid

started_on

pbudget

dname

ssn

DepartmentsProjects Sponsors

Employees

Monitors

lot

Figure 3.16 Aggregation

sors (did, pid), and the descriptive attributes of Monitors (until). This translation is

essentially the standard mapping for a relationship set, as described in Section 3.5.2.

There is a special case in which this translation can be refined further by dropping

the Sponsors relation. Consider the Sponsors relation. It has attributes pid, did, and

since, and in general we need it (in addition to Monitors) for two reasons:

1. We have to record the descriptive attributes (in our example, since) of the Sponsors

relationship.

2. Not every sponsorship has a monitor, and thus some 〈pid, did〉 pairs in the Spon-

sors relation may not appear in the Monitors relation.

However, if Sponsors has no descriptive attributes and has total participation in Mon-

itors, every possible instance of the Sponsors relation can be obtained by looking at

the 〈pid, did〉 columns of the Monitors relation. Thus, we need not store the Sponsors

relation in this case.

3.5.8 ER to Relational: Additional Examples *

Consider the ER diagram shown in Figure 3.17. We can translate this ER diagram

into the relational model as follows, taking advantage of the key constraints to combine

Purchaser information with Policies and Beneficiary information with Dependents:

The Relational Model 77

name

agepname

DependentsEmployees

ssn

policyid cost

Beneficiary

lot

Policies

Purchaser

Figure 3.17 Policy Revisited

CREATE TABLE Policies ( policyid INTEGER,

cost REAL,

ssn CHAR(11) NOT NULL,

PRIMARY KEY (policyid),

FOREIGN KEY (ssn) REFERENCES Employees

ON DELETE CASCADE )

CREATE TABLE Dependents ( pname CHAR(20),

age INTEGER,

policyid INTEGER,

PRIMARY KEY (pname, policyid),

FOREIGN KEY (policyid) REFERENCES Policies

ON DELETE CASCADE )

Notice how the deletion of an employee leads to the deletion of all policies owned by

the employee and all dependents who are beneficiaries of those policies. Further, each

dependent is required to have a covering policy—because policyid is part of the primary

key of Dependents, there is an implicit NOT NULL constraint. This model accurately

reflects the participation constraints in the ER diagram and the intended actions when

an employee entity is deleted.

In general, there could be a chain of identifying relationships for weak entity sets. For

example, we assumed that policyid uniquely identifies a policy. Suppose that policyid

only distinguishes the policies owned by a given employee; that is, policyid is only a

partial key and Policies should be modeled as a weak entity set. This new assumption

78 Chapter 3

about policyid does not cause much to change in the preceding discussion. In fact,

the only changes are that the primary key of Policies becomes 〈policyid, ssn〉, and as

a consequence, the definition of Dependents changes—a field called ssn is added and

becomes part of both the primary key of Dependents and the foreign key referencing

Policies:

CREATE TABLE Dependents ( pname CHAR(20),

ssn CHAR(11),

age INTEGER,

policyid INTEGER NOT NULL,

PRIMARY KEY (pname, policyid, ssn),

FOREIGN KEY (policyid, ssn) REFERENCES Policies

ON DELETE CASCADE)

3.6 INTRODUCTION TO VIEWS

A view is a table whose rows are not explicitly stored in the database but are computed

as needed from a view definition. Consider the Students and Enrolled relations.

Suppose that we are often interested in finding the names and student identifiers of

students who got a grade of B in some course, together with the cid for the course.

We can define a view for this purpose. Using SQL-92 notation:

CREATE VIEW B-Students (name, sid, course)

AS SELECT S.sname, S.sid, E.cid

FROM Students S, Enrolled E

WHERE S.sid = E.sid AND E.grade = ‘B’

The view B-Students has three fields called name, sid, and course with the same

domains as the fields sname and sid in Students and cid in Enrolled. (If the optional

arguments name, sid, and course are omitted from the CREATE VIEW statement, the

column names sname, sid, and cid are inherited.)

This view can be used just like a base table, or explicitly stored table, in defining new

queries or views. Given the instances of Enrolled and Students shown in Figure 3.4, B-

Students contains the tuples shown in Figure 3.18. Conceptually, whenever B-Students

is used in a query, the view definition is first evaluated to obtain the corresponding

instance of B-Students, and then the rest of the query is evaluated treating B-Students

like any other relation referred to in the query. (We will discuss how queries on views

are evaluated in practice in Chapter 23.)

The Relational Model 79

name sid course

Jones 53666 History105

Guldu 53832 Reggae203

Figure 3.18 An Instance of the B-Students View

3.6.1 Views, Data Independence, Security

Consider the levels of abstraction that we discussed in Section 1.5.2. The physical

schema for a relational database describes how the relations in the conceptual schema

are stored, in terms of the file organizations and indexes used. The conceptual schema is

the collection of schemas of the relations stored in the database. While some relations

in the conceptual schema can also be exposed to applications, i.e., be part of the

external schema of the database, additional relations in the external schema can be

defined using the view mechanism. The view mechanism thus provides the support

for logical data independence in the relational model. That is, it can be used to define

relations in the external schema that mask changes in the conceptual schema of the

database from applications. For example, if the schema of a stored relation is changed,

we can define a view with the old schema, and applications that expect to see the old

schema can now use this view.

Views are also valuable in the context of security: We can define views that give a

group of users access to just the information they are allowed to see. For example, we

can define a view that allows students to see other students’ name and age but not

their gpa, and allow all students to access this view, but not the underlying Students

table (see Chapter 17).

3.6.2 Updates on Views

The motivation behind the view mechanism is to tailor how users see the data. Users

should not have to worry about the view versus base table distinction. This goal is

indeed achieved in the case of queries on views; a view can be used just like any other

relation in defining a query. However, it is natural to want to specify updates on views

as well. Here, unfortunately, the distinction between a view and a base table must be

kept in mind.

The SQL-92 standard allows updates to be specified only on views that are defined

on a single base table using just selection and projection, with no use of aggregate

operations. Such views are called updatable views. This definition is oversimplified,

but it captures the spirit of the restrictions. An update on such a restricted view can

80 Chapter 3

always be implemented by updating the underlying base table in an unambiguous way.

Consider the following view:

CREATE VIEW GoodStudents (sid, gpa)

AS SELECT S.sid, S.gpa

FROM Students S

WHERE S.gpa > 3.0

We can implement a command to modify the gpa of a GoodStudents row by modifying

the corresponding row in Students. We can delete a GoodStudents row by deleting

the corresponding row from Students. (In general, if the view did not include a key

for the underlying table, several rows in the table could ‘correspond’ to a single row

in the view. This would be the case, for example, if we used S.sname instead of S.sid

in the definition of GoodStudents. A command that affects a row in the view would

then affect all corresponding rows in the underlying table.)

We can insert a GoodStudents row by inserting a row into Students, using null values

in columns of Students that do not appear in GoodStudents (e.g., sname, login). Note

that primary key columns are not allowed to contain null values. Therefore, if we

attempt to insert rows through a view that does not contain the primary key of the

underlying table, the insertions will be rejected. For example, if GoodStudents con-

tained sname but not sid, we could not insert rows into Students through insertions

to GoodStudents.

An important observation is that an INSERT or UPDATE may change the underlying

base table so that the resulting (i.e., inserted or modified) row is not in the view! For

example, if we try to insert a row 〈51234, 2.8〉 into the view, this row can be (padded

with null values in the other fields of Students and then) added to the underlying

Students table, but it will not appear in the GoodStudents view because it does not

satisfy the view condition gpa > 3.0. The SQL-92 default action is to allow this

insertion, but we can disallow it by adding the clause WITH CHECK OPTION to the

definition of the view.

We caution the reader that when a view is defined in terms of another view, the inter-

action between these view definitions with respect to updates and the CHECK OPTION

clause can be complex; we will not go into the details.

Need to Restrict View Updates

While the SQL-92 rules on updatable views are more stringent than necessary, there

are some fundamental problems with updates specified on views, and there is good

reason to limit the class of views that can be updated. Consider the Students relation

and a new relation called Clubs:

The Relational Model 81

Clubs(cname: string, jyear: date, mname: string)

A tuple in Clubs denotes that the student called mname has been a member of the

club cname since the date jyear.4 Suppose that we are often interested in finding the

names and logins of students with a gpa greater than 3 who belong to at least one

club, along with the club name and the date they joined the club. We can define a

view for this purpose:

CREATE VIEW ActiveStudents (name, login, club, since)

AS SELECT S.sname, S.login, C.cname, C.jyear

FROM Students S, Clubs C

WHERE S.sname = C.mname AND S.gpa > 3

Consider the instances of Students and Clubs shown in Figures 3.19 and 3.20. When

cname jyear mname

Sailing 1996 Dave

Hiking 1997 Smith

Rowing 1998 Smith

Figure 3.19 An Instance C of Clubs

sid name login age gpa

50000 Dave dave@cs 19 3.3

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

53650 Smith smith@math 19 3.8

Figure 3.20 An Instance S3 of Students

evaluated using the instances C and S3, ActiveStudents contains the rows shown in

Figure 3.21.

name login club since

Dave dave@cs Sailing 1996

Smith smith@ee Hiking 1997

Smith smith@ee Rowing 1998

Smith smith@math Hiking 1997

Smith smith@math Rowing 1998

Figure 3.21 Instance of ActiveStudents

Now suppose that we want to delete the row 〈Smith, smith@ee, Hiking, 1997〉 from Ac-

tiveStudents. How are we to do this? ActiveStudents rows are not stored explicitly but

are computed as needed from the Students and Clubs tables using the view definition.

So we must change either Students or Clubs (or both) in such a way that evaluating the

4We remark that Clubs has a poorly designed schema (chosen for the sake of our discussion of viewupdates), since it identifies students by name, which is not a candidate key for Students.

82 Chapter 3

view definition on the modified instance does not produce the row 〈Smith, smith@ee,

Hiking, 1997.〉 This task can be accomplished in one of two ways: by either deleting

the row 〈53688, Smith, smith@ee, 18, 3.2〉 from Students or deleting the row 〈Hiking,

1997, Smith〉 from Clubs. But neither solution is satisfactory. Removing the Students

row has the effect of also deleting the row 〈Smith, smith@ee, Rowing, 1998〉 from the

view ActiveStudents. Removing the Clubs row has the effect of also deleting the row

〈Smith, smith@math, Hiking, 1997〉 from the view ActiveStudents. Neither of these

side effects is desirable. In fact, the only reasonable solution is to disallow such updates

on views.

There are views involving more than one base table that can, in principle, be safely

updated. The B-Students view that we introduced at the beginning of this section

is an example of such a view. Consider the instance of B-Students shown in Figure

3.18 (with, of course, the corresponding instances of Students and Enrolled as in Figure

3.4). To insert a tuple, say 〈Dave, 50000, Reggae203〉 B-Students, we can simply insert

a tuple 〈Reggae203, B, 50000〉 into Enrolled since there is already a tuple for sid 50000

in Students. To insert 〈John, 55000, Reggae203〉, on the other hand, we have to insert

〈Reggae203, B, 55000〉 into Enrolled and also insert 〈55000, John, null, null, null〉

into Students. Observe how null values are used in fields of the inserted tuple whose

value is not available. Fortunately, the view schema contains the primary key fields

of both underlying base tables; otherwise, we would not be able to support insertions

into this view. To delete a tuple from the view B-Students, we can simply delete the

corresponding tuple from Enrolled.

Although this example illustrates that the SQL-92 rules on updatable views are un-

necessarily restrictive, it also brings out the complexity of handling view updates in

the general case. For practical reasons, the SQL-92 standard has chosen to allow only

updates on a very restricted class of views.

3.7 DESTROYING/ALTERING TABLES AND VIEWS

If we decide that we no longer need a base table and want to destroy it (i.e., delete

all the rows and remove the table definition information), we can use the DROP TABLE

command. For example, DROP TABLE Students RESTRICT destroys the Students table

unless some view or integrity constraint refers to Students; if so, the command fails.

If the keyword RESTRICT is replaced by CASCADE, Students is dropped and any ref-

erencing views or integrity constraints are (recursively) dropped as well; one of these

two keywords must always be specified. A view can be dropped using the DROP VIEW

command, which is just like DROP TABLE.

ALTER TABLE modifies the structure of an existing table. To add a column called

maiden-name to Students, for example, we would use the following command:

The Relational Model 83

ALTER TABLE Students

ADD COLUMN maiden-name CHAR(10)

The definition of Students is modified to add this column, and all existing rows are

padded with null values in this column. ALTER TABLE can also be used to delete

columns and to add or drop integrity constraints on a table; we will not discuss these

aspects of the command beyond remarking that dropping columns is treated very

similarly to dropping tables or views.

3.8 POINTS TO REVIEW

The main element of the relational model is a relation. A relation schema describes

the structure of a relation by specifying the relation name and the names of each

field. In addition, the relation schema includes domain constraints, which are

type restrictions on the fields of the relation. The number of fields is called the

degree of the relation. The relation instance is an actual table that contains a set

of tuples that adhere to the relation schema. The number of tuples is called the

cardinality of the relation. SQL-92 is a standard language for interacting with a

DBMS. Its data definition language (DDL) enables the creation (CREATE TABLE)

and modification (DELETE, UPDATE) of relations. (Section 3.1)

Integrity constraints are conditions on a database schema that every legal database

instance has to satisfy. Besides domain constraints, other important types of

ICs are key constraints (a minimal set of fields that uniquely identify a tuple)

and foreign key constraints (fields in one relation that refer to fields in another

relation). SQL-92 supports the specification of the above kinds of ICs, as well as

more general constraints called table constraints and assertions. (Section 3.2)

ICs are enforced whenever a relation is modified and the specified ICs might con-

flict with the modification. For foreign key constraint violations, SQL-92 provides

several alternatives to deal with the violation: NO ACTION, CASCADE, SET DEFAULT,

and SET NULL. (Section 3.3)

A relational database query is a question about the data. SQL supports a very

expressive query language. (Section 3.4)

There are standard translations of ER model constructs into SQL. Entity sets

are mapped into relations. Relationship sets without constraints are also mapped

into relations. When translating relationship sets with constraints, weak entity

sets, class hierarchies, and aggregation, the mapping is more complicated. (Sec-

tion 3.5)

A view is a relation whose instance is not explicitly stored but is computed as

needed. In addition to enabling logical data independence by defining the external

schema through views, views play an important role in restricting access to data for

84 Chapter 3

security reasons. Since views might be defined through complex queries, handling

updates specified on views is complicated, and SQL-92 has very stringent rules on

when a view is updatable. (Section 3.6)

SQL provides language constructs to modify the structure of tables (ALTER TABLE)

and to destroy tables and views (DROP TABLE). (Section 3.7)

EXERCISES

Exercise 3.1 Define the following terms: relation schema, relational database schema, do-

main, relation instance, relation cardinality, and relation degree.

Exercise 3.2 How many distinct tuples are in a relation instance with cardinality 22?

Exercise 3.3 Does the relational model, as seen by an SQL query writer, provide physical

and logical data independence? Explain.

Exercise 3.4 What is the difference between a candidate key and the primary key for a given

relation? What is a superkey?

Exercise 3.5 Consider the instance of the Students relation shown in Figure 3.1.

1. Give an example of an attribute (or set of attributes) that you can deduce is not a

candidate key, based on this instance being legal.

2. Is there any example of an attribute (or set of attributes) that you can deduce is a

candidate key, based on this instance being legal?

Exercise 3.6 What is a foreign key constraint? Why are such constraints important? What

is referential integrity?

Exercise 3.7 Consider the relations Students, Faculty, Courses, Rooms, Enrolled, Teaches,

and Meets In that were defined in Section 1.5.2.

1. List all the foreign key constraints among these relations.

2. Give an example of a (plausible) constraint involving one or more of these relations that

is not a primary key or foreign key constraint.

Exercise 3.8 Answer each of the following questions briefly. The questions are based on the

following relational schema:

Emp(eid: integer, ename: string, age: integer, salary: real)

Works(eid: integer, did: integer, pct time: integer)

Dept(did: integer, dname: string, budget: real, managerid: integer)

1. Give an example of a foreign key constraint that involves the Dept relation. What are

the options for enforcing this constraint when a user attempts to delete a Dept tuple?

The Relational Model 85

2. Write the SQL statements required to create the above relations, including appropriate

versions of all primary and foreign key integrity constraints.

3. Define the Dept relation in SQL so that every department is guaranteed to have a

manager.

4. Write an SQL statement to add ‘John Doe’ as an employee with eid = 101, age = 32

and salary = 15, 000.

5. Write an SQL statement to give every employee a 10% raise.

6. Write an SQL statement to delete the ‘Toy’ department. Given the referential integrity

constraints you chose for this schema, explain what happens when this statement is

executed.

Exercise 3.9 Consider the SQL query whose answer is shown in Figure 3.6.

1. Modify this query so that only the login column is included in the answer.

2. If the clause WHERE S.gpa >= 2 is added to the original query, what is the set of tuples

in the answer?

Exercise 3.10 Explain why the addition of NOT NULL constraints to the SQL definition of

the Manages relation (in Section 3.5.3) would not enforce the constraint that each department

must have a manager. What, if anything, is achieved by requiring that the ssn field of Manages

be non-null?

Exercise 3.11 Suppose that we have a ternary relationship R between entity sets A, B,

and C such that A has a key constraint and total participation and B has a key constraint;

these are the only constraints. A has attributes a1 and a2, with a1 being the key; B and

C are similar. R has no descriptive attributes. Write SQL statements that create tables

corresponding to this information so as to capture as many of the constraints as possible. If

you cannot capture some constraint, explain why.

Exercise 3.12 Consider the scenario from Exercise 2.2 where you designed an ER diagram

for a university database. Write SQL statements to create the corresponding relations and

capture as many of the constraints as possible. If you cannot capture some constraints, explain

why.

Exercise 3.13 Consider the university database from Exercise 2.3 and the ER diagram that

you designed. Write SQL statements to create the corresponding relations and capture as

many of the constraints as possible. If you cannot capture some constraints, explain why.

Exercise 3.14 Consider the scenario from Exercise 2.4 where you designed an ER diagram

for a company database. Write SQL statements to create the corresponding relations and

capture as many of the constraints as possible. If you cannot capture some constraints,

explain why.

Exercise 3.15 Consider the Notown database from Exercise 2.5. You have decided to rec-

ommend that Notown use a relational database system to store company data. Show the

SQL statements for creating relations corresponding to the entity sets and relationship sets

in your design. Identify any constraints in the ER diagram that you are unable to capture in

the SQL statements and briefly explain why you could not express them.

86 Chapter 3

Exercise 3.16 Translate your ER diagram from Exercise 2.6 into a relational schema, and

show the SQL statements needed to create the relations, using only key and null constraints.

If your translation cannot capture any constraints in the ER diagram, explain why.

In Exercise 2.6, you also modified the ER diagram to include the constraint that tests on a

plane must be conducted by a technician who is an expert on that model. Can you modify

the SQL statements defining the relations obtained by mapping the ER diagram to check this

constraint?

Exercise 3.17 Consider the ER diagram that you designed for the Prescriptions-R-X chain of

pharmacies in Exercise 2.7. Define relations corresponding to the entity sets and relationship

sets in your design using SQL.

Exercise 3.18 Write SQL statements to create the corresponding relations to the ER dia-

gram you designed for Exercise 2.8. If your translation cannot capture any constraints in the

ER diagram, explain why.

PROJECT-BASED EXERCISES

Exercise 3.19 Create the relations Students, Faculty, Courses, Rooms, Enrolled, Teaches,

and Meets In in Minibase.

Exercise 3.20 Insert the tuples shown in Figures 3.1 and 3.4 into the relations Students and

Enrolled. Create reasonable instances of the other relations.

Exercise 3.21 What integrity constraints are enforced by Minibase?

Exercise 3.22 Run the SQL queries presented in this chapter.

BIBLIOGRAPHIC NOTES

The relational model was proposed in a seminal paper by Codd [156]. Childs [146] and Kuhns

[392] foreshadowed some of these developments. Gallaire and Minker’s book [254] contains

several papers on the use of logic in the context of relational databases. A system based on a

variation of the relational model in which the entire database is regarded abstractly as a single

relation, called the universal relation, is described in [655]. Extensions of the relational model

to incorporate null values, which indicate an unknown or missing field value, are discussed by

several authors; for example, [280, 335, 542, 662, 691].

Pioneering projects include System R [33, 129] at IBM San Jose Research Laboratory (now

IBM Almaden Research Center), Ingres [628] at the University of California at Berkeley,

PRTV [646] at the IBM UK Scientific Center in Peterlee, and QBE [702] at IBM T.J. Watson

Research Center.

A rich theory underpins the field of relational databases. Texts devoted to theoretical aspects

include those by Atzeni and DeAntonellis [38]; Maier [436]; and Abiteboul, Hull, and Vianu

[3]. [355] is an excellent survey article.

The Relational Model 87

Integrity constraints in relational databases have been discussed at length. [159] addresses se-

mantic extensions to the relational model, but also discusses integrity, in particular referential

integrity. [305] discusses semantic integrity constraints. [168] contains papers that address

various aspects of integrity constraints, including in particular a detailed discussion of refer-

ential integrity. A vast literature deals with enforcing integrity constraints. [41] compares the

cost of enforcing integrity constraints via compile-time, run-time, and post-execution checks.

[124] presents an SQL-based language for specifying integrity constraints and identifies con-

ditions under which integrity rules specified in this language can be violated. [624] discusses

the technique of integrity constraint checking by query modification. [149] discusses real-time

integrity constraints. Other papers on checking integrity constraints in databases include

[69, 103, 117, 449]. [593] considers the approach of verifying the correctness of programs that

access the database, instead of run-time checks. Note that this list of references is far from

complete; in fact, it does not include any of the many papers on checking recursively specified

integrity constraints. Some early papers in this widely studied area can be found in [254] and

[253].

For references on SQL, see the bibliographic notes for Chapter 5. This book does not discuss

specific products based on the relational model, but many fine books do discuss each of

the major commercial systems; for example, Chamberlin’s book on DB2 [128], Date and

McGoveran’s book on Sybase [172], and Koch and Loney’s book on Oracle [382].

Several papers consider the problem of translating updates specified on views into updates

on the underlying table [49, 174, 360, 405, 683]. [250] is a good survey on this topic. See

the bibliographic notes for Chapter 23 for references to work querying views and maintaining

materialized views.

[642] discusses a design methodology based on developing an ER diagram and then translating

to the relational model. Markowitz considers referential integrity in the context of ER to

relational mapping and discusses the support provided in some commercial systems (as of

that date) in [446, 447].


Recommended