PHP Mysql tips

Continuing from my earlier post on PHP performance, I thought I’d share a few Mysql tips that I’ve learnt over the years. Hope it helps someone and please leave a comment with your own tips or provide any corrections to the ones mentioned.

Word searching

1.

SELECT * FROM table WHERE MATCH (`field`) AGAINST ('Keyword')

(Fastest)

2.

SELECT * FROM table WHERE MATCH (`field`) AGAINST ('+Keyword' IN BOOLEAN MODE)

(Fast)

3.

SELECT * FROM table WHERE RLIKE '(^| +)Keyword($| +)'

OR

SELECT * FROM table WHERE
RLIKE '([[:space:]]|[[:<:]])Keyword([[:space:]]|[[:>:]])'

(Slow)

Contains searching

1.

SELECT * FROM table WHERE MATCH (`field`) AGAINST ('Keyword*' IN BOOLEAN MODE)

(Fastest)

2.

SELECT * FROM table WHERE field LIKE 'Keyword%'

(Fast)

3.

SELECT * FROM table WHERE MATCH (`field`) AGAINST ('*Keyword*' IN BOOLEAN MODE)

(Slow)

4.

SELECT * FROM table WHERE field LIKE '%Keyword%'

(Slow)

Recordsets

1.

SELECT SQL_CALC_FOUND_ROWS * FROM table WHERE Condition LIMIT 0, 10
SELECT FOUND_ROWS()

(Fastest)

2.

SELECT * FROM table WHERE Condition LIMIT 0, 10
SELECT COUNT(PrimaryKey) FROM table WHERE Condition

(Fast)

3.

$result = mysql_query("SELECT * FROM table", $link);
$num_rows = mysql_num_rows($result);

(Very slow)

Joins

Use an INNER JOIN when you want the joining table to only have matching records that you specify in the join. Use LEFT JOIN when it doesn’t matter if the records contain matching records or not.

SELECT * FROM products
INNER JOIN suppliers ON suppliers.SupplierID = products.SupplierID

Returns all products with a matching supplier.

SELECT * FROM products
LEFT JOIN suppliers ON suppliers.SupplierID = products.SupplierID
WHERE suppliers.SupplierID IS NULL

Returns all products without a matching supplier.

Best practice

1. Always use lowercase for table names. (If you use different OS’s this is a must)
2. Always prepend the table name to the field. E.g. ProductName, SupplierPostCode.
This makes multiple joins very easy.
3. Always create a primary id field with the name of the table followed by the id. e.g. ProductID
4. Index fields used for joins.
5. Use a separate logging table or transactions for logs of table updates, deletes etc.

Comments 26

  1. Richard@Home wrote:

    Good tips, although I’m not too sure about Best Practice #2.

    It seems to contradict #1 (lower case names). Shouldn’t field names be lower case too for the same reason?

    And, what’s up with using product.name, supplier.post_code?

    This becomes even more apparent when you use table aliases and join the same table twice in the same query. In your example, the sql looks like

    SELECT RelatedProduct.ProductName, Product.ProductName FROM …

    instead of:

    SELECT RelatedProduct.Name, ProductName

    (or better yet, SELECT related_product.name, product.name)

    I suspect the ‘real’ reason for appending the table name is that in some (older) DBMS, field names were unique across the whole database. I’m struggling to think of any where this is still the case though.

    Posted 23 Jul 2007 at 1:05 pm
  2. Marc Gear wrote:

    I have to say I disagree with several of your best practice points. I think nomenclature is a per-project thing, and there are arguments for different ways. My preference is to not have the table name prefixing fields in the table, because the table name is implied - I don’t see how it makes ‘joins very easy’ - if anything it makes them longer - with SQL like suppliers.SupplierID = product.SupplierID instead of suppliers.id = product.supplier_id. The same goes for your point on the primary key - why add the prefix? the membership of the table is implied. You should certianly prefix the foreign keys with the table name, otherwise you wont know which ‘id’ field it is you’re looking at :)

    I would always recommend using lowercase field names too, with words separated by underscores. The reason for this is that mixing camelCase and lcase underscores in the same DB looks odd, and by keeping your names in lowercase you can easily identify field names in your SQL if you keep SQL keywords in upper case.

    Posted 23 Jul 2007 at 1:18 pm
  3. Gareth Heyes wrote:

    @Richard@Home

    Lowercase table names are important because on some systems the table name is case sensitive and can result in SQL errors when transporting your system to the new OS.

    I have programmed many systems which need to be on different servers and demo laptops which can result in a huge headache.

    The reason for pre-pending the table name to the fields is that instantly another developer will understand your database schema very easily.

    Posted 23 Jul 2007 at 1:50 pm
  4. Gareth Heyes wrote:

    @Marc Gear

    Yeah underscores would be better but I have a habit of not using them because of a current project I’m developing (which doesn’t use underscores). I find lowercase table fields don’t work for me because I find the name easier to remember if they are captitalised and I tend to do less mistakes.

    Posted 23 Jul 2007 at 1:55 pm
  5. Gareth Heyes wrote:

    @Richard@Home

    I may have given a bad example in the post, consider the following example:-

    SELECT ProductDescription, SupplierDescription FROM products INNER JOIN suppliers ON suppliers.SupplierID = products.SupplierID
    

    instead of

    SELECT products.Description AS ProductDescription, suppliers.description AS SupplierDescription FROM products INNER JOIN suppliers ON suppliers.SupplierID = products.SupplierID
    
    Posted 23 Jul 2007 at 2:01 pm
  6. Tomek wrote:

    on the recordsets - #2 might actually be faster than #1 with query cache being turned on.
    http://drupal.org/node/38297#comment-147794

    Posted 23 Jul 2007 at 3:29 pm
  7. Gareth Heyes wrote:

    @Tomek

    Interesting I’ve always found the other to be faster but I’ll try it out in future because I do have a large database which performs some advanced sql queries.

    Thanks for the link!

    Posted 23 Jul 2007 at 3:37 pm
  8. Whis wrote:

    Yes, but you didn’t wrote one think…it works only with MyISAM :P
    Quote from manual “Full-text indexes can be used only with MyISAM tables, and can be created only for CHAR, VARCHAR, or TEXT columns.”

    Posted 23 Jul 2007 at 8:11 pm
  9. Gareth Heyes wrote:

    @Whis

    Thanks for pointing that out, I completely forgot to include it in my post :)

    Posted 23 Jul 2007 at 9:20 pm
  10. Chris wrote:

    SELECT products.Description AS ProductDescription, suppliers.description AS SupplierDescription FROM products INNER JOIN suppliers ON suppliers.SupplierID = products.SupplierID

    is going to get real long real fast if you include more tables.

    Use table aliases:

    SELECT
    p.Description AS ProductDescription,
    s.description AS SupplierDescription
    FROM products p INNER JOIN suppliers s ON s.SupplierID = p.SupplierID

    Posted 24 Jul 2007 at 3:41 am
  11. Solidus Obscura wrote:

    Three questions… Both in regards to a future (currently in development) site.

    1. My site will make heavy use of pagination (it’s a custom implementation of a forum/priv. msg/etc.). Currently I do a “SELECT COUNT(*) FROM tbl(s) WHERE condition” and then, after running a function called “make_page_nav” which simultaneously creates the pages array and suitable LIMIT condition, I run the same query as above, this time selecting the appropriate rows within the condition.

    Your “fast” Resultset queries seem to do the opposite, grab the data first, and then do a count (or FOUND_ROWS()). Is this implementation faster, by virtue of not scanning the table twice? (I had figured doing the count query would cache the query, by which the full select would feed from). But if No. 2 is faster with query caching than No. 1, would it even make a difference which order the count and datagrab queries are in?

    2. My site will enable searching of multiple keywords or phrases against a single field. The way I have it now, supposing the search was: apples OR oranges would translate to SELECT * FROM tbl WHERE field LIKE ‘%apples%’ OR field LIKE ‘%oranges%’
    Where if either term is found in the text field of a row, than that row is returned. Your first two “fast” contains matching queries seem to only match the beginning of the text field to the keyword (which to me seems why that would be faster than trying to find the keyword ANYWHERE in the field). I haven’t been exposed to the MATCH (`field`) AGAINST (’*Keyword*’ IN BOOLEAN MODE) syntax, but what is the benefit of using that, and can it be stringed with OR’s?

    3. I tend to use stringing by OR’s a lot as well, when matching a set of ID’s to a table.. For example, I store a set of moderators for a forum as |5|27|151| where the numbers are the UserId’s of the moderator. I have a function that converts that list to an array, and a function that converts the array to an ‘orchunk’, ex. (userid=5 OR userid=27 OR userid=151). I haven’t tried to use any “set” functions with mysql/php, rather storing sets as strings and using standardized (for my site) PHP functions to convert back and forth to arrays. Is this sort of tactic efficient?

    Answers or links to suitable literature is much appreciated. I’d run performance tests, but since my site isn’t operational to the expected 40-100 concurrent users, I can’t trust the results. The worst case table size would be 100,000 rows (where the above LIKE searches could be run on).

    Posted 07 Aug 2007 at 6:53 pm
  12. Gareth Heyes wrote:

    Hi Solidus

    Great questions!

    1. My understanding of the FOUND_ROWS() call is that Mysql stores the recordcount in memory so it doesn’t run the query twice. In some instances the SELECT COUNT method can be faster, your best bet is to experiment between the two. I changed my database classes from one method to the other and noticed a good improvement so I stuck with FOUND_ROWS().
    2. The MATCH AGAINST facility of mysql uses FULL TEXT indexing so you will notice a huge improvement on a large body of text for example.
    3. In that instance I would use a JOIN to get the results. When I develop multiple user levels, I usually have a UserID or GroupID field which links together the particular table ,user or group.

    LIKE searches will run fine on a recordset of 100,000 records if you design your tables correctly but once you start hitting bigger numbers I think it would be best to use FULL TEXT searching IMO.

    Posted 07 Aug 2007 at 8:35 pm
  13. Solidus Obscura wrote:

    So you set an index on the text field then, and run that sort of query against them? (Sorry if that sounds silly, I’m just accustomed to using numerical indexes). And, would that work on MEDIUMTEXT fields (According to a previous poster… “Full-text indexes can be used only with MyISAM tables, and can be created only for CHAR, VARCHAR, or TEXT columns.”)? I generally try to use TEXT fields whenever possible, but there are one or two body fields in my db that require the maximum of a MEDIUMTEXT (more than 65535 chars).

    Posted 08 Aug 2007 at 12:51 pm
  14. Gareth Heyes wrote:

    Yes they can only be applied to CHAR, VARCHAR and TEXT fields as the poster mentioned and with MyISAM tables only. You will notice a huge speed improvement for lots of fields with TEXT columns.

    With MEDIUMTEXT fields or bigger I suggest making a indexing program which will only gather keywords of the field and place it in a smaller field thus allowing you to search the keywords instead of the MEDIUMTEXT fields.

    More information on FULL TEXT is available here:-
    http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

    Posted 08 Aug 2007 at 1:03 pm
  15. Solidus Obscura wrote:

    Dunno if I’ll do an implementation of keywords, especially since most of the content will be random postings. More probable will be keywords on user-posted fanarts/fictions which make more sense to search by keyword instead of full text (which the fields will be of the MEDIUMTEXT variety, and you can’t search the content of an image… yet). I’ll keep the tip in mind (glad I happened upon this post).

    The thing with the user levels/groups is that the set I search upon can be a random set. In my example, there can be a set of moderators stored as |1|2|3|, and these are set by an admin through a web-based control panel. They can be unrelated to a group, so to get the usernames of these mods, I’d do a SELECT userid,username FROM user WHERE (userid=1 OR userid = 2 OR userid = 3) ORDER BY username where the “or string” is generated by a function. I was curious if there was a more efficient way of doing this. It just dawned on me that it might be better to do a userid IN (1,2,3)…

    Posted 08 Aug 2007 at 2:27 pm
  16. Gareth Heyes wrote:

    Yeah IN is much faster, you can also use it with SUB QUERIES as well. For example SELECT * FROM table1 WHERE Field1 IN (SELECT Field2 FROM table2) worth knowing ;)

    Posted 08 Aug 2007 at 2:52 pm
  17. Solidus Obscura wrote:

    Yeah, I did sub-queries a lot in a previous job which is why it just occurred to me, since (1,2,3) is really just like a result set of a sub-query. I actually just did a performance test with a different site I’m working on and found little difference in load times. I’ll have to do more testing (preferably with more concurrent users).

    Thanks for all the tips. Of course, I should and will performance test all of the advice, but it’s nice knowing things you didn’t before.

    Posted 08 Aug 2007 at 3:01 pm
  18. Gareth Heyes wrote:

    Yep performance test everything because it can vary between datasets a lot, I found some things a lot faster than others but you might not get the same results based on your data so keep that in mind. I’m glad the blog helped :)

    Posted 08 Aug 2007 at 3:45 pm
  19. asd wrote:

    phpinfo();

    Posted 02 Sep 2007 at 4:55 pm
  20. Gareth Heyes wrote:

    ooooo ya got me LOL

    ENDSARCASM.

    Posted 02 Sep 2007 at 5:14 pm
  21. Adam wrote:

    Very nice, I wrote a similar article on TalkPHP.com and I was amazed to find that a lot of programmers who used MySQL were blissfully unaware of such statements - like if statements! How can they not know?

    Posted 13 Nov 2007 at 4:13 pm
  22. Tom wrote:

    nice tips , i’m using PHP for 5 years but didn’t know all your examples !

    I was looking for a lowercase query and found somewhere else :

    select * FROM hardware WHERE name regexp ‘$name’

    REGEXP isn’t case sensitive. So, problem solved kinda…

    Posted 29 Feb 2008 at 1:38 pm
  23. Bijay Rungta wrote:

    Ho do you propose to achieve #5

    5. Use a separate logging table or transactions for logs of table updates, deletes etc.

    Any recommendation for the database structure for a general table????

    Posted 19 Jul 2008 at 11:34 am
  24. Gareth Heyes wrote:

    @Bijay

    I wrote a email management system log which had the following structure:-
    ID
    TableName
    PrimaryKeyName
    Action
    LogDate
    UserID
    ClientID

    This allows you to ask questions like How many times to user X update table X on whatever date.

    Posted 21 Jul 2008 at 8:17 am
  25. Giulio wrote:

    Thanks, join is always a pain in the ***.
    Check this tutorial on joins for some more info http://buup.mephir.com/read/11/sql-join-close-up-what-when-and-how.html

    Posted 06 Jun 2009 at 1:16 pm
  26. vishal wrote:

    Very Good Practise… i have Done this Practially its works perfeclty

    Posted 28 Nov 2009 at 5:37 am

Post a Comment

Your email is never published nor shared. Required fields are marked *

Comment spam protected by SpamBam