Monday, February 23, 2009

Stored Procedures Are Slow Part 2

Last time I demonstrated a case where stored procedures are slow.  There were a few comments that I should include selects in the stored procedure to make the tests  more realistic.  From experience I already knew the answer so I didn’t go into that level of detail, but since stored procedures are all about database access this is a reasonable comment. 

This is a simple stored procedure that selects data and then summarizes it by customer.  No one would actually write this as it is far too easy to use a SQL statement instead.    Assume the logic is more complex and can’t be done easily by the standard SQL techniques of case statements, temp tables, etc, and then this makes more sense.  

The end result is this stored procedure takes 696 seconds to run. 


create procedure slowCounter()

    declare done int default 0;
    declare curCustomerId int default 0;
    declare customerCount int default 0;
    declare slowCur cursor for select customerId from saleT;
    declare continue handler for not found
        set done = TRUE;

    create temporary table tempCount
        tempCustomerId int not null,
        tempCustomerCount int not null,
        primary key (tempCustomerId)

    open slowCur;
    fetch slowCur into curCustomerId ;
    while not done do
        insert into tempCount(tempCustomerId, tempCustomerCount)
            values (curCustomerId , 1)
        on duplicate key update tempCustomerCount = tempCustomerCount + 1;
        fetch slowCur into curCustomerId ;
    end while;

    close slowCur;

    select * from tempCount;



This more complex C# code (and still not fully exception proofed) only takes 1.4 seconds to run.  That is nearly 1/500 the time it took the stored procedure to execute.  


using System;
using System.Collections.Generic;
using MySql.Data.MySqlClient;

namespace Summarization
    /// <summary>
    /// Tests the performance of C# to simulate a SQL group by
    /// </summary>
    class SumTestGroup : BaseSum
        protected MySqlConnection sqlConn;

        private Dictionary<Int32, String> customer =
            new Dictionary<Int32, String>();
        private Dictionary<String, Int32> customerRowCount =
            new Dictionary<String, Int32>();

        public void execute()
                sqlConn = getConnectionMySql();
                int count = 0;
                Int32 customerId;
                String customerName;


                MySqlCommand cmd = new MySqlCommand("select s.customerId from Salet s", sqlConn);
                MySqlDataReader rdr = cmd.ExecuteReader();

                while (rdr.Read())
                    customerId = (Int32)rdr["customerId"];
                    customerName = customer[customerId];

                    if (customerRowCount.ContainsKey(customerName))
                        customerRowCount[customerName] = 1;
                    //Console.WriteLine("Count: " + count);

                foreach (KeyValuePair<String, Int32> customerRow in customerRowCount)
                    Console.WriteLine("Customer: " + customerRow.Key + " Customer Count: " +  customerRow.Value);


        private void getCustomers()
            MySqlCommand cmd = new MySqlCommand("select customerId, customerName from customer", sqlConn);
            MySqlDataReader rdr = cmd.ExecuteReader();

            Int32 customerId;
            String customerName;

            while (rdr.Read())
                if (!rdr.IsDBNull(0))
                    customerId = (Int32)rdr["customerId"];
                    customerName = (String)rdr["customerName"];
                    customer.Add(customerId, customerName);

Both produce results like this:

customer Name 0, 100287
customer Name 1, 100009
customer Name 2, 100130
customer Name 3, 99655
customer Name 4, 100134
customer Name 5, 100102
customer Name 6, 99854
customer Name 7, 100172
customer Name 8, 99846
customer Name 9, 99812


In this case, the effort required to retrieve one million rows out of the database and then process them takes orders of magnitude less than a stored procedure takes to process the data inside the database.  

No one would actually write this code when a simple SQL alternative is possible.  But given complex logic that isn’t possible or easy to express in SQL that needs to be applied against a large volume of data, using a stored procedure cursor (or loop in general) isn’t a fast way to go.  This is odd as the whole point of stored procedures is to write code that deals with data contained in a database.  


Joe Harris said...

Is this a serious post?

Why would you use a cursor for this? Cursors are a legacy element of SQL that should not be used in modern code. By using a cursor you prevent the database from optimizing the execution of the query.

The whole point of SQL is to state what you want to get and let the database find the best way to get it. It doesn't alway get it right but a cursor *never* will.

Not trying to be a troll here... but if your background is object oriented programming I suggest you do some research on set theory.

OR just read any book by Joe Celko.

dbscience said...

I stated in the article "No one would actually write this code when a simple SQL alternative is possible". It was a simple example designed to show an example of cursors being very slow.

To understand the need for this article see the comments in part 1 where people suggest to do this as they didn't understand stored procs were so slow.

Still, there are cases where problems can't be easily solved using set logic and procedural logic is the better solution, as I also stated in the article. In those cases a stored proc cursor is slow and one might want to explore alternatives.

You might also want to see the next article "Is SQL Slow?", as there are cases where SQL is far slower than C# or Java code. I'm not stating anything new here as Ralph Kimball pointed this out a long time ago.