Friday, August 29, 2008

Can a timestamp be slower than a datetime?

One should use timestamps rather than datetimes as timestamps are four bytes and datetimes are eight bytes, right?  This makes sense as smaller indexes and tables are faster than larger ones. However, when I was trying to quantify the improvement (I’m strange that way), I found the exact opposite - when used as a criteria, a timestamp is slower than a datetime. Often much slower.

For example, for this simple sql, when purchaseDate was a timestamp, the query took about 9.4 seconds and the datetime version took about 4.3 seconds. In this case the data was in memory.

    select sum(unit) from Sale where purchaseDate >= '2001-07-01' and purchaseDate < '2001-08-01'

I did other tests, such as when data wasn’t cached and there was io, and I found cases where datetimes were much faster than timestamps, differences so extreme I’m hesitant to report them until I have a better explanation of what is going on. I really can’t believe the performance differences I'm measuring and I have this suspicion that something else is occurring as these numbers seem too extreme, but what I can’t say.

Was there a bug for this I don’t know about? It is hard to find information on timestamps as it is such a common word, but I did scan the MySQL bugs for timestamps and didn’t find anything. Is it possible the extra functionality of timestamps, like timezone conversions, makes timestamps slower than datetimes?

In a few other cases, I found them about the same speed; only in one case were timestamps faster than datetimes, and that was 94 seconds for timestamps and 97 seconds for datetimes, which is an insignificant difference.

I tested this on 5.1.26 and 5.0.22, and found the performance difference between timestamps and datetimes on both versions.

I'm not sure what to conclude from this. If you aren’t frequently using the date oriented columns as criteria in queries, and dates are something like a rarely used record of when an event occurred, the smaller timestamp is the better choice. If, however, if many queries are run against the date in question, and assuming these results I’m getting are accurate, then a datetime column should at least be under consideration. In other words, benchmark datetimes and timestamps in your environment with representative queries. The results are worse for timestamps when they are used as a selection criteria.

If you get the crazy performance differences I’m getting, or if you don’t, let me know.


For the curious, the plans for the timestamp queries looked like this:

mysql> explain select sum(unit) from SaleT where purchaseDate >= '2001-07-01'  and purchaseDate < '2001-08-01' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: SaleT
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 20267520
        Extra: Using where
1 row in set (0.04 sec)

And the datetime queries looked like this:

mysql> explain select sum(unit) from Sale where purchaseDate >= '2001-07-01'  and purchaseDate < '2001-08-01' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: Sale
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 20548080
        Extra: Using where
1 row in set (0.06 sec)

And the table looks like this (with the other version having a purchaseDate timestamp column instead of a datetime).


CREATE TABLE  `test`.`Sale` (
  `orderId` int(11) NOT NULL,
  `customerId` int(11) NOT NULL,
  `productId` int(11) NOT NULL,
  `productBigId` int(11) NOT NULL,
  `unit` int(11) NOT NULL,
  `purchaseAmount` decimal(16,2) NOT NULL,
  `purchaseCost` decimal(16,2) NOT NULL,
  `purchaseDate` datetime NOT NULL,
  PRIMARY KEY  (`purchaseDate`,`orderId`),
  UNIQUE KEY `idx_sale_order` (`orderId`),
  KEY `pf_sale_product` (`productId`),
  KEY `pf_sale_customer` (`customerId`),
  CONSTRAINT `pf_sale_customer` FOREIGN KEY (`customerId`) REFERENCES `Customer` (`customerId`),
  CONSTRAINT `pf_sale_product` FOREIGN KEY (`productId`) REFERENCES `Product` (`productId`)


On a somewhat related note, I also got this repeatable bug when trying to insert data into a timestamp column on 5.1.26:

    Incorrect datetime value: '2001-03-11 02:00:39' for column 'purchaseDate' at row 226


Anonymous said...

I bet this case is unique to using a timestamp column as the leftmost part of an InnoDB clustered index.

To try to isolate this, how about repeating the test by creating otherwise identical tables, but change the TIMESTAMP column to INT and the DATETIME column to BIGINT, populating them by using UNIX_TIMESTAMP() to preserve distribution.

And the invalid timestamp error screams of time zone conversion involving daylight saving time, even though the US didn't switch to DST on that date.

Since you must be using a US time zone, try rerunning all the tests after setting the time zone in use to UTC.

Anonymous said...

I was actually wondering about the same thing today.. And wondered if it was a wise idea to go with INTs altogher..

John Dzilvelis said...

After reading your blog I setup a test and found similar results...and elevated processor utilization. I suspect that there is some overhead of converting the timestamps to dates for display. I interpret the manual as saying that timestamp data is stored as an integer number in the database, so thats a conversion that has to be done by the engine.

if you change your where clause to use:

...purchaseDate >= TIMESTAMP('2001-07-01')+0 may notice a n improvement in performance as I did.

I generally advise development teams to avoid the timestamp datatype in MySQL for business data such as a purchase date since the datetime datatype is much more universal for this.

Anonymous said...

That bug BTW looks very much like a Daylight Saving problem I've experienced in migrations before.

Anonymous said...

I found the same results two weeks ago on MySQL 5.0.32 using plani Myisam tables.

I was on the way to turn some datetime's into timestamps, but this effect allowed me to scratch that TODO item off the list without any work :)

Still find it strange though, and curious to hear about the reason!

Unknown said...

Same results here with InnoDB range queries on a non-indexed timestamp column, difference of performance is 260% faster with datetime.

Anonymous said...

Hi there,

Can you do some news tests?

Okay, you've created timestamp column in table schema, but you use human readable date specification in query.

What about if you use timestamp in the queries?

SELECT name WHERE registered BETWEEN 1124448300 AND 1225557200