Questions tagged [database-design]

The development of the conceptual schema and/or the logical model and/or the physical settings of a database.

The result of database design is a plan for the construction of a database that captures some of the essential features of the proposed database, but omits a lot of less important detail. These plans often take the form of data models, and database design can be learned as the art of constructing a certain kind of data model.

Most databases that capture and manage semi-permanent data operate under the control of a Database Management System (DBMS). Prominent DBMS products are SQL Server, Oracle RDBMS, and DB2. There are dozens of others. Many of the questions and answers you’ll find under this tag relate to one of these DBMS products, but some design issues are DBMS independent.

The amount of preparation and education you’ll need before building your first successful database varies widely depending on many factors. Among other factors, it depends on how ambitious your database project is and on what prior experience you bring to bear on the project. Very experienced programmers sometimes underestimate the amount of material there is to learn about database design.

Sometimes programmers learn well by trial and error, or by postponing formal learning until their second or third project. Other times, database design neophytes make design decisions that lead into pitfalls that are very difficult to reverse.

There are many ways to measure the quality of a database design. Programmers building their first database are often primarily concerned with performance. There’s no question that performance is important. A bad design can easily result in database operations that take ten to a hundred times as much time as they should.

But don’t let performance issues blind you to other aspects of good design. In particular, future proofing of a database is enormously important. Failure to do this can result in a database that traps its users at the first level and prevents their data from evolving as their needs evolve.

Another aspect involves separating out the hidden features of a database (sometimes called physical design) from the public features visible across the application interface (sometimes called logical design). A neat separation of these features can result a database that can be tweaked and tuned quite a bit with no changes to application code. A poor separation of these features can result in a database that makes a nightmare out of application development or database administration.

Another consideration is whether the proposed database will be embedded within a single application, or whether it will be an information hub that serves the needs of multiple applications. Some design decisions will be made very differently in these two cases.

Yet another consideration is whether the application is going to perform all data management functions on behalf of its clients, or whether custodial responsibility for the database and its data is going to be vested in one or more DBAs (Data Base Administrators).


What kinds of questions will appear in the database-design tag?

You'll see a lot of questions about table design, data normalization, index design, query optimization, constraint declarations, and keys. A lot of questions, and many of the responses will address issues of speed or performance. There will be a lot of questions about key selection.

Most of the questions are about relational databases, including the SQL databases that are commonly called relational. A few questions are about "truly relational" databases or about "non relational" or "post relational" databases. A few are about semistructured or unstructured data.

A lot of questions tagged "database design" will also be tagged "data modeling". There is a huge overlap between the two subjects.

You'll see a lot of questions on the subject of table composition and decomposition. Closely related to table decomposition is the concept of data normalization. Indeed, many responders treat table decomposition and data normalization as though they are synonymous terms. They aren't quite synonymous. Nearly all improvements in data normalization result in table decomposition, but there are plenty of ways of decomposing tables that have nothing to do with normalization.

Data normalization is a brand new topic to many neophyte database designers. It's worth learning the rudiments of data normalization, even if the database you are building is small and simple. It's also sometimes worthwhile to disregard the rules of data normalization, but you really have to know what you are doing.

You'll also see a lot of questions on the subject of index design. Closely related to index design is query optimization. Many questions about either index design or query design have to do with how much effort the programmer should expend in getting the very best result out of the optimizer.

Three things are worth keeping in mind. First, optimization is often a matter of tradeoffs. Sometimes organizing things for rapid query will slow down data updates. Sometimes speed really matters in some database operations, but not others.

Second, you really need to pay attention to those things that that slow operations down from seconds to minutes, or from minutes to hours, before you worry about 10% improvements.

Third, database delays vary enormously as the volume of data increases and as the number of concurrent users increases. Simple tests with one user and sample data can really mislead you about speed in a production environment.

6175 questions
300
votes
15 answers

Can MySQL reasonably perform queries on billions of rows?

I am planning on storing scans from a mass spectrometer in a MySQL database and would like to know whether storing and analyzing this amount of data is remotely feasible. I know performance varies wildly depending on the environment, but I'm looking…
haxney
  • 2,463
  • 2
  • 13
  • 7
160
votes
12 answers

Should binary files be stored in the database?

What is the best place for storing binary files that are related to data in your database? Should you: Store in the database with a blob Store on the filesystem with a link in the database Store in the filesystem but rename to a hash of the…
Jack Douglas
  • 38,607
  • 15
  • 97
  • 174
145
votes
3 answers

Advantages and Disadvantages to using ENUM vs Integer types?

Lets say in some random table, you have a column named status. It's real-world values would be either enabled or disabled. Is it better for this column's data type to be an int/bool (1 or zero) or to use ENUM with the values being enabled and…
Jake Wilson
  • 2,377
  • 8
  • 21
  • 23
141
votes
8 answers

Why shouldn't we allow NULLs?

I remember reading this one article about database design and I also remember it said you should have field properties of NOT NULL. I don't remember why this was the case though. All I can seem to think of is that, as an application developer, you…
Thomas Stringer
  • 41,694
  • 9
  • 115
  • 154
128
votes
3 answers

How can I specify the position for a new column in PostgreSQL?

If I have a table with the columns: id | name | created_date and would like to add a column, I use: alter table my_table add column email varchar(255) Then the column is added after the created_date column. Is there any way I can specify the…
Jonas
  • 31,495
  • 27
  • 59
  • 64
119
votes
3 answers

Is a composite index also good for queries on the first field?

Let's say I have a table with fields A and B. I make regular queries on A+B, so I created a composite index on (A,B). Would queries on only A also be fully optimized by the composite index? Additionally, I created an index on A, but Postgres still…
Luciano
  • 1,621
  • 3
  • 12
  • 8
116
votes
5 answers

Best database and table design for billions of rows of data

I am writing an application that needs to store and analyze large amounts of electrical and temperature data. Basically I need to store large amounts of hourly electricity usage measurements for the past several years and for many years to come for…
Gecata
  • 1,263
  • 3
  • 8
  • 5
108
votes
5 answers

Storing vs calculating aggregate values

Are there any guidelines or rules of thumb to determine when to store aggregate values and when to calculate them on the fly? For example, suppose I have widgets which users can rate (see schema below). Each time I display a widget I could…
BenV
  • 4,823
  • 7
  • 37
  • 38
81
votes
10 answers

What are the arguments against or for putting application logic in the database layer?

NOTE The audience of programmers.se and dba.se is different, and will have different viewpoints, so in this instance I think it's valid to duplicate What are the arguments against or for putting application logic in the database layer? on…
Phil Lello
  • 1,449
  • 1
  • 11
  • 9
76
votes
7 answers

Writing a simple bank schema: How should I keep my balances in sync with their transaction history?

I am writing the schema for a simple bank database. Here are the basic specifications: The database will store transactions against a user and currency. Every user has one balance per currency, so each balance is simply the sum of all…
73
votes
5 answers

Is there a name for this database schema of key values?

We process a routine data feed from a client who just refactored their database from a form that seems familiar (one row per entity, one column per attribute) to one that seems unfamiliar to me (one row per entity per attribute): Before: one column…
prototype
  • 843
  • 1
  • 7
  • 6
71
votes
6 answers

What is the best way to store an email address in PostgreSQL?

What would be the right datatype to store email addresses in PostgreSQL? I can use varchar (or even text), but I wonder if there is a more specific data type for emails.
Adam Matan
  • 11,019
  • 29
  • 79
  • 94
69
votes
1 answer

How to manage DEFAULT PRIVILEGES for USERs on a DATABASE vs SCHEMA?

I want to migrate a fairly simple, internal, database driven application from SQLite3 to PostgreSQL 9.3 and tighten the permissions in the DB as I go. The application currently consists of a command to update the data; and one to query it.…
Jim Dennis
  • 745
  • 1
  • 6
  • 12
62
votes
9 answers

Plural vs Singular Table Name

How should I name my Tables when creating a new database? Singular: Client or Plural: Clients?
57
votes
9 answers

Should you design the database before the application code is written?

What is the easiest and most efficient way to design a database? From my perspective, there are a couple of options for an application's data store design: Design the database as best as you can initially before writing any application code. This…
Thomas Stringer
  • 41,694
  • 9
  • 115
  • 154
1
2 3
99 100