Database Science: February 2009

Wednesday, February 25, 2009

Is SQL Slow?

Last time I demonstrated a case where stored procedures are slow when they have to do computationally expensive work. The more interesting, to me at least, question is how slow is SQL? The answer to that is far more complex.

For example, this simple SQL takes 2.2 seconds to run on MySQL. This is a painfully faster than the 696 seconds it took a stored procedure to produce similar results.

select customerName, count(*)
from Salet s
join customer c on s.customerId = c.customerId
group by customerName

As demonstrated in the previous article, the equivalent C# code took 1.4 seconds to produce the same results. Some may find it surprising that it take less time to ship 1 million rows out of the database and then summarize it in code than it does for MySQL to summarize the same data in the database.

In this simple case the performance difference isn’t much and is not worth the extra code complexity. For more complex SQL, with more joins, perhaps nested case statements and temp tables, and similar standard SQL techniques, it is often much faster to do the transformation logic in non-SQL code. I’ve found cases where I was able to improve performance by over ten times by using C# or Java code over SQL. Still, my inclination is to always see if I can’t get acceptable performance from SQL first as less code is generally required, and usually far less code. SQL is an exceptionally expressive language for certain types of problems.

Plus, the performance advantage of C# or Java won’t be true in all cases. Stating the obvious, shipping out all the sales data won’t make sense for more selective queries that query only a few sales. In this case it makes far more sense to write in SQL.

Deciding where to execute code, in the database or elsewhere, is a complex problem that would require a series of articles to answer reasonably (a hint, think about using sequential IO as much as possible). For now I just want to point out that running everything in SQL isn’t always the best performing method.

Monday, February 23, 2009

Stored Procedures Are Slow Part 2

Last time I demonstrated a case where stored procedures are slow. There were a few comments that I should include selects in the stored procedure to make the tests more realistic. From experience I already knew the answer so I didn’t go into that level of detail, but since stored procedures are all about database access this is a reasonable comment.

This is a simple stored procedure that selects data and then summarizes it by customer. No one would actually write this as it is far too easy to use a SQL statement instead. Assume the logic is more complex and can’t be done easily by the standard SQL techniques of case statements, temp tables, etc, and then this makes more sense.

The end result is this stored procedure takes 696 seconds to run.

create procedure slowCounter()
begin

    declare done int default 0;
    declare curCustomerId int default 0;
    declare customerCount int default 0;
    declare slowCur cursor for select customerId from saleT;
    declare continue handler for not found
        set done = TRUE;

    create temporary table tempCount
    (
        tempCustomerId int not null,
        tempCustomerCount int not null,
        primary key (tempCustomerId)
    );

    open slowCur;
    fetch slowCur into curCustomerId ;
    while not done do
        insert into tempCount(tempCustomerId, tempCustomerCount)
            values (curCustomerId , 1)
        on duplicate key update tempCustomerCount = tempCustomerCount + 1;
        fetch slowCur into curCustomerId ;
    end while;

close slowCur;

select * from tempCount;

end

This more complex C# code (and still not fully exception proofed) only takes 1.4 seconds to run. That is nearly 1/500 the time it took the stored procedure to execute.

using System;
using System.Collections.Generic;
using MySql.Data.MySqlClient;

namespace Summarization
{
    /// <summary>
    /// Tests the performance of C# to simulate a SQL group by
    /// </summary>
    class SumTestGroup : BaseSum
    {
        protected MySqlConnection sqlConn;

        private Dictionary<Int32, String> customer =
            new Dictionary<Int32, String>();
        private Dictionary<String, Int32> customerRowCount =
            new Dictionary<String, Int32>();

        public void execute()
        {
            try
            {
                sqlConn = getConnectionMySql();
                sqlConn.Open();
                int count = 0;
                Int32 customerId;
                String customerName;

                getCustomers();

                MySqlCommand cmd = new MySqlCommand("select s.customerId from Salet s", sqlConn);
                MySqlDataReader rdr = cmd.ExecuteReader();

                while (rdr.Read())
                {
                    customerId = (Int32)rdr["customerId"];
                    customerName = customer[customerId];
                    count++;

                    if (customerRowCount.ContainsKey(customerName))
                    {
                        customerRowCount[customerName]++;
                    }
                    else
                    {
                        customerRowCount[customerName] = 1;
                    }
                    //Console.WriteLine("Count: " + count);
                }

                foreach (KeyValuePair<String, Int32> customerRow in customerRowCount)
                {
                    Console.WriteLine("Customer: " + customerRow.Key + " Customer Count: " + customerRow.Value);
                }
                rdr.Close();

            }
            finally
            {
                sqlConn.Close();
            }
        }

        private void getCustomers()
        {
            MySqlCommand cmd = new MySqlCommand("select customerId, customerName from customer", sqlConn);
            MySqlDataReader rdr = cmd.ExecuteReader();

            Int32 customerId;
            String customerName;

            while (rdr.Read())
            {
                if (!rdr.IsDBNull(0))
                {
                    customerId = (Int32)rdr["customerId"];
                    customerName = (String)rdr["customerName"];
                    customer.Add(customerId, customerName);
                }
            }
            rdr.Close();
        }
    }
}

Both produce results like this:

customer Name 0, 100287
customer Name 1, 100009
customer Name 2, 100130
customer Name 3, 99655
customer Name 4, 100134
customer Name 5, 100102
customer Name 6, 99854
customer Name 7, 100172
customer Name 8, 99846
customer Name 9, 99812

In this case, the effort required to retrieve one million rows out of the database and then process them takes orders of magnitude less than a stored procedure takes to process the data inside the database.

No one would actually write this code when a simple SQL alternative is possible. But given complex logic that isn’t possible or easy to express in SQL that needs to be applied against a large volume of data, using a stored procedure cursor (or loop in general) isn’t a fast way to go. This is odd as the whole point of stored procedures is to write code that deals with data contained in a database.

Friday, February 13, 2009

Are MySQL stored procedures slow?

Yes, if compared to code in Java or C#. For example, this overly simple code took 284 seconds.

CREATE PROCEDURE CountTest()
begin
    declare counter int default 0;
    select now();
    repeat
        set counter = counter + 1;
    until counter > 120000000
    end repeat;
    select counter;
    select now();
end

Ignoring my off by one error, here is equivalent code in C# (the language I’m currently learning). It took 419 milliseconds, or MySQL took 677 times longer to execute. From my experience, Java isn’t going to be any slower.

int counter = 0;
while (counter < 120000000)
{
counter++;
}

Slow stored procedure performance is one of the reasons why it usually isn’t wise to implement computationally expensive business logic in the database. With networks not being the bottleneck they once were it is often better to extract the data to the application layer and process it in the programming language of your choice, even if that takes a few extra round trips to the database. There are exceptions where those extra round trips are too numerous and prohibitively expensive. With something as complex as a database there are always exceptions.

This example is a bit too simple, but, based on experience, it is representative of the general performance of stored procedure logic.

How can MySQL get away with such poor performance? Well, the equivalent code in MS SQL Server took 80 seconds on the same hardware, which is also two orders of magnitude slower than C# or Java code.

create procedure testCount
as
begin
    declare @countNo int;
    set @countNo = 0;
    while (@countNo < 120000000)
    begin
        set @countNo = @countNo + 1
    end
    select @countNo
end

MS SQL Server code runs 3.5 times faster than MySQL, but given how much older MS SQL Server is, MySQL is doing well here. So are MySQL Stored procedures slow? Not really, if compared to other databases I’ve used.

Just don’t use them to do computationally expensive business logic.

Database Science