Maybe you know how to drive a car. Maybe you can fix the carburetor. You probably don’t assume that you know how to design the car. However, programmers who can write huge SQL statements and understand the output of an EXPLAIN statement often assume that they can design a database. Admittedly, unless we’re comparing Oracle 9i with the 1967 Volkswagen Beetle, databases usually aren’t more complex than cars, but a properly designed database can be very hard to come by yet it’s the key to a solid application.
Recently, a friend asked for help with a programming problem and I agreed to take a look. After hearing a bit of explanation of the problem domain, and taking a poke and what he had, I realized that while his code was indeed buggy, the underlying bug stemmed from bad database design in an older version of MySQL (some version 4, but I don’t recall the exact one). Many, many developer hours are wasted on bad database design, but they often don’t seem to realize this. We accept that bugs are something we need to deal with but don’t always realize our opportunities for minimizing those bugs. We curse at cleaning up garbage data in our database without realizing that if you have a properly designed database, it can be very difficult to insert garbage data.
Note: I’ve heavily changed the description of the problem domain to avoid causing my friend problems, but database design is identical. In fact, I’ve so heavily changed the problem description that I’m practically lying about my role in it. I’m an American. Sue me.
My friend’s problem stemmed from some code which was trying to insert an undefined value into an ENUM field named task.priority and he wasn’t sure why his code was generating a warning. A warning in and of itself isn’t a bug, but it can hint at bugs. Boy were there bugs! (The warning was generated by code converting an undefined value to a NULL, but I won’t go into that). I did some digging in the database to understand what was up. Here was my first query:
mysql> select * from task where priority not in ('1','2','3');
+-----------+--------------+--------------+----------+
| job_queue | task | name | priority |
+-----------+--------------+--------------+----------+
| 0 | glj_three | Sum Accounts | |
| 0 | glj_demand | Reconcile | |
+-----------+--------------+--------------+----------+
So my second query was SHOW CREATE TABLE task.
CREATE TABLE task (
job_queue int(10) unsigned NOT NULL default '0',
task varchar(15) NOT NULL default '',
name varchar(50) NOT NULL default '',
priority enum('1','2','3') default NULL,
PRIMARY KEY (task),
KEY id (job_queue)
);
Note that many versions of MySQL allow an empty string in this ENUM field :(
Before covering the numerous issues involved, let’s discuss what this is for. We have ‘job queues’ which get executed and each job comprises a series of tasks with a human readable name and a priority for the task in the queue. What’s really crucial here is the business rules. As much as is possible, business rules must be in the database layer, not in the code layer. If you do this, another application using the database is much less likely to get the business rules wrong. Here are the rules for this problem.
- Each job must have one primary task (priority 1).
- Each job must have one secondary task (priority 2).
- A job many have zero to many tertiary tasks (priority 3).
- No task may be in more than one queue.
- As a job is canceled, all tasks are released and do not have to be reassigned.
- Primary tasks must run first for each job.
- Secondary tasks must run second for each job.
- Tertiary tasks must run after secondary but in any order.
There is no job with an id of 0 (zero), so this was used to represent ‘free’ tasks (why they didn’t use NULL is beyond me, but even then there’s a better way of doing this). As it turns out, special casing of the job id can lead to bugs, but we’ll ignore that for this blog entry.
Since we had priority tasks failing, I decided to see if we had priorities without jobs since this wouldn’t make much sense.
mysql> select name, priority from task where job_queue = 0 limit 5; +------------------+----------+ | Name | priority | +------------------+----------+ | Cleanup Accts | 2 | | Check Free Space | 1 | | Allocate Funds | 2 | | Process Invoices | 1 | | Check Bonus | 2 | +------------------+----------+
This was apparently from code which attempted to release tasks from jobs with the SQL similar to the following:
UPDATE tasks SET job_queue = 0 WHERE job_queue = ?
Before we go on, a side note is in order. The above code is better written as this:
UPDATE tasks
SET job_queue = 0,
priority = NULL
WHERE job_queue = ?
First off, NULL values in a database can be very dangerous as they can easily lead to logic problems which produce incorrect query results. Second, if you find yourself with several fields in table which are coupled together and must always be updated together, perhaps they’re not really dependent on your primary key and should be in another table? It doesn’t guarantee a bug, but it’s a code smell which should be investigated. In our case, it’s a smell which leads to a better solution later.
Back to the issue at hand, we have a potential problem. If the code naively pulls tasks based on priority and the code which removes jobs (sets job_queue to zero) forgets to clear out priority, that could be a bug and different sections of the code are tightly coupled. In fact, it could be easy to get confused priorities merely by having this field:
SELECT job, task, name FROM task ORDER BY job, priority
As it turns out, priorities might have an empty string, but they could also be NULL (arg!) so that should be written as:
SELECT job, task, name FROM task WHERE job != 0 AND priority != '' AND priority IS NOT NULL ORDER BY job, priority
Why do we have these weird special cases showing up? They’re hardly intuitive. Instead, considering the aforementioned code smell, it would be better to have a job_queue_task table (typing this SQL from memory, so it may have mistakes):
CREATE TABLE job_queue_task (
job_queue int(10) NOT NULL,
task varchar(15) NOT NULL UNIQUE,
priority enum('1','2','3') NOT NULL,
PRIMARY KEY (job_queue, task),
INDEX jq_idx (job_queue),
FOREIGN KEY (job_queue) REFERENCES job_queue(id),
INDEX task_idx (task),
FOREIGN KEY (task) REFERENCES task(id)
);
And your task table now looks like this (may as well make it plural):
CREATE TABLE tasks ( task varchar(15) NOT NULL PRIMARY KEY, name varchar(50) NOT NULL, );
What this does is allow you to insert a record in the job_queue_task table for each task for a job and to delete that record when not needed (or delete an entire job_queue id from that table and not worry about individual fields). Inserting or deleting entire records usually leads to fewer bugs than trying to always remember which fields to update and which to leave alone, particularly as tables grow in size. Selecting tasks now becomes this:
CREATE VIEW job_tasks AS SELECT
job_queue AS queue,
tasks.task AS task,
tasks.name AS name
FROM job_queue_task, tasks
WHERE job_queue_task.task = tasks.task
ORDER BY job_queue, priority
Then for all apps to figure out which jobs tasks are available, they can skip the tricky logic with:
SELECT queue, task, name FROM job_queue
Want to remove a job queue? Remember that we had this SQL:
UPDATE tasks
SET job_queue = 0,
priority = NULL
WHERE job_queue = ?
Now it’s as simple as this:
DELETE FROM job_queue_tasks WHERE job_queue = ?
The special cases have gone away and wrapping the CREATE TABLE statement in a view (not possible in my friend’s version of MySQL) that apps have read access to can abstract most of the problems away.
That leaves the question of the single primary and secondary priorities and zero to many tertiary priorities. Depending on your database, you can write the rule in a trigger or have the trigger call a stored procedure which, before inserts or updates, validates that you’re not inserting a duplicate primary or secondary priority (note that you can do interesting things by associating triggers with views, but MySQL doesn’t support this).
There’s a lot more which could be done with this, but this is just a basic example of how to better represent business rules in a database. When properly enforced, other applications can then use this database and are much less likely to corrupt the data in there. This saves developer time as they write fewer bugs and it lets your DBAs sleep at night.


Ewww, triggers. Last resort. Last resort.
You don’t need them.
Get rid of priority 3, make the
prioritycolumn nullable, and make a unique index overjob_queue, task, priority. Then you can have exactly one row each for priorities 1 and 2 for any givenjob_queue, taskpair and any number of rows withNULLpriority for the same pair.I view this as very dangerous (and that is toned down fromt he words that I was originally going to use). Just because you can build it from a technical point of view, doesn't mean that you should.
A couple of questions:
- What's your defintition of a business rule? If it's data integrity (as are most of the 'rules' that you list) then yes it belongs in the database, but it's not a business rule as most people would understand it.
- My definition of a business rule is a piece of logic that can be articulated by end users that is reviewed and changed frequently. For a larger database, how is it possible to review 'rules' once they are scattered throughout the database?
I don't mean to come across as too negative as at least you have the courage to suggest a way of doing things. Likewise, the approach I prefer is at http://www.oreillynet.com/articles/author/2366
In summary: I don't use a database to run rules, and I don't use a rule engine to store data.
Paul
Paul: I understand your concern, but many business rules should be relatively invariant. For example. maybe a customer's required to have an address and maybe they aren't. That's the sort of rule which tends to be an invariant (but read on). Business rules which change little should be reflected in the database layer to ensure that you're not dependent on the applications "getting them right". Personally, I'm tired of fixing bugs caused code in language X getting the business rules wrong because they were implemented in language Y. There are plenty of ways this can come about and it's not always easy to share this knowledge outside of the database. Of course, once it's pulled from the database, no code is then required to respect it. It's a very common source of bugs.
On the other hand, you also have application rules. Those should generally not be in the database. This is application logic and control flow. It's similar to the difference between the "model" and "controller" and an MVC framework.
If you want to keep the invariant rules out of the database, that's your choice, but it hinders complexity management. Of course, some people will claim that there are no invariant rules and this is true. But that's why we have test suites and learn to refactor databases the way we refactor code.
@Ovid
I think the key here is 'what is a business rule'. To me the sample you give (customer requiring an address) is a data integrity rule and should , without doubt , be in the database. You are correct that these rules do not change too often.
To me , a business rule is more 'when the customer makes an order of more than 10000, a credit check needs to be run'. These do change and need constant review. I tend towards using a Rule Engine for these (and normally JBoss Rules) as
- The rules are written in a near english language
- The rules can be accessed by many different applications (no risk of differing implementations)
- The rules can be versioned and stored in a database (as data, not as db code)
Paul
@Paul: the reason the "should addresses be in the customer table?" question arises is because it reflects a business rule. Two similar sets of data can have different normalization requirements based on business rules.
One company I worked for wanted to send out a mass mailers to their customers. That was pretty easy. We selected the customer name and address from the customer table. Later we were having a huge sale and wanted to bring in new business and we wanted to send out mail to every addresses, regardless of whether or not we had a customer for the address. We had to pay a lot of money to an outside company to collect this data.
For that other company, they have customers (us, for example) who are associated with addresses, but since address would no longer be dependent on customers, the addresses would properly be stored in another table. Proper normalization is a reflection of business rules. However, it can't be perfect, as in the case of my company's database not supporting this one-off need). If we found ourselves doing this regularly, it would mean the nature of our business has changed, our business rules have changed, and we would then have to determine if refactoring the database and maintaining the address list would be a better solution than buying the address list from another company.
@Paul:
I forgot to mention that I do agree that it can be difficult to store all rules in a database (particularly when working with older database software). However, if given the choice, I'd at least prefer to put them in stored procedures to ensure that code run against the database is unlikely to subvert said rules. How can you protect your data when the guy down the hall is writing a quick hack to "fix things"? How do you protect your data when you have a huge database and another large project needs access to it, but they're using a different programming language?
Pulling this "must obey" knowledge out of the database means that someone using the database can easily ignore it and is likely ignorant of parts of it.
Hi Ovid,
I am a little late is discovering this post, by here goes anyways.
Whether you want to have your rules in your DB or externalized using a rule engine is a design choice that depends on what kind of IT team you have, and what your business guys need.
Even considering the "quick fix hack angle", I think you will have more long term testing issues with the db way, than with the rule engine way. Many customers and evaluators that I have come across raise testing and maintenance as a major reason for choosing to use rule engines.
Even in the case where multiple systems run on different programming languages, I would suggest using the rule engine as a Web Service (popular term, Decision Service), and then letting clients use it.
My 2 cents
As you can notified, I am a student in Higher Degree Computer Science and I am building my project in Database specially by designing one of the college registration Database but actually, I am stacked because of the business rules of my database and I have checking through the internet to find out how people could help but no answer, so I need your help to assist in what I am doing.
Do not hesitate to contact me anytime even by my cellphone number at 0027725513954 or e-mail me by the above e-address.
hoping that I will get the answer from you as soon as possible,wish a wonderful journey!
Regards