Tuesday, September 1, 2009

Enterprise Software as a Service (SaaS) and Partitioning

 

Partitioning, new with MySQL 5.1, has complicated interactions with queries and indexes.  If one isn’t careful it is easy to degrade performance.   For example, select queries that go with that grain (queries where partition elimination occurs) can be much quicker, but select queries that go against that grain can be much slower.   Queries that go against the grain must query each partition, so for a table with 12 partitions,  one query against that  table can result in 12 queries, one against each of the partitions.  An example of this would be a query against a month partitioned table that is looking to see how much activity a product had in the past 12 months. 

The ideal partitioning scheme would be a system where all queries only needs to access data from one partition.  This describes enterprise software deployed as a service where multiple enterprise tenants all exist within one database .  As one enterprise tenant (think of a company like a bank, manufacturing firm, or retailer, not a consumer using facebook or twitter)  only queries their own data, the enterprise tenantId provides an ideal grain on which divide up the data.  This means each table that has tenant specific data must have a tenantId.  A Sale table for a multi-tenant database would look like this:

 

CREATE TABLE  Sale (
   tenantId int NOT NULL,
   orderId int NOT NULL,
   customerId int NOT NULL,
   productId int NOT NULL,
   unit int NOT NULL,
   purchaseAmount decimal(16,2) NOT NULL,
   purchaseCost decimal(16,2) NOT NULL,
   purchaseDate datetime NOT NULL,
   PRIMARY KEY (tenantId, orderId),
   KEY idx_sale_product (productId),
   KEY idx_sale_customer (customerId),
   KEY idx_sale_purchaseDate (purchaseDate)
PARTITION BY LIST(tenantId) (
   PARTITION t1 VALUES IN (1) ENGINE=InnoDB,
   PARTITION t2 VALUES IN (2) ENGINE=InnoDB,
   PARTITION t3 VALUES IN (3) ENGINE=InnoDB,
   PARTITION t4 VALUES IN (4) ENGINE=InnoDB,
   PARTITION t5 VALUES IN (5) ENGINE=InnoDB)

 

If you are using InnoDB, an alternative to partitioning by tenant is to create clustered indexes by tenantId.  Before MySQL had partitioning, this was a good way to implement a multi-tenant database.  If you are curious about this type of solution you can find more here:

http://dbscience.blogspot.com/2008_07_01_archive.html

Both partitioning by tenant and using InnoDB clustered indexes as in the above article are roughly going to perform the same for large data volumes.  

The advantage that partitioning  provides is on administrative tasks like server splits.  When there are too many tenants on one server and a split needs to occur the stressed out database data can be replicated to another server.  After the replication there will be two servers, each with roughly half of the tenants inactive.  Instead of slow and hardware consuming mass delete of now inactive tenants on a server the inactive partitions can be dropped in a second.  While the server split is still painful, this makes the reallocation of tenants across servers easier and the system is fully available far earlier. 

There there are the other administrate benefits with partitioning, such as dropping the data for an inactive tenant quickly.  

A downside is that you have to keep the partitioning list or range current as new tenants are added.  You will probably want to pre-allocate tenant partitions to avoid having to add partitions at the last moment. 

However, be aware of the partitioning limitations, such as only 1024 partitions per table.  This means only 1024 tenants per database, so if you store more than 1024 tenants in one database you will want to combine multiple tenants into one partition. 

If you expect to overwhelm a single database server, and if you are developing enterprise software as a service that is very possible as even simple enterprise applications seem to generate terabytes of data these days, you should strongly considering partitioning tables by the tenant.