PHP Mysql tips

Monday, 23 July 2007

Continuing from my earlier post on PHP performance, I thought I’d share a few Mysql tips that I’ve learnt over the years. Hope it helps someone and please leave a comment with your own tips or provide any corrections to the ones mentioned.

Word searching

SELECT * FROM table WHERE MATCH (`field`) AGAINST ('Keyword')

(Fastest)

SELECT * FROM table WHERE MATCH (`field`) AGAINST ('+Keyword' IN BOOLEAN MODE)

(Fast)

SELECT * FROM table WHERE RLIKE '(^| +)Keyword($| +)'

SELECT * FROM table WHERE 
RLIKE '([[:space:]]|[[:<:]])Keyword([[:space:]]|[[:>:]])'

(Slow)

Contains searching

SELECT * FROM table WHERE MATCH (`field`) AGAINST ('Keyword*' IN BOOLEAN MODE)

(Fastest)

SELECT * FROM table WHERE field LIKE 'Keyword%'

(Fast)

SELECT * FROM table WHERE MATCH (`field`) AGAINST ('*Keyword*' IN BOOLEAN MODE)

(Slow)

SELECT * FROM table WHERE field LIKE '%Keyword%'

(Slow)

Recordsets

SELECT SQL_CALC_FOUND_ROWS * FROM table WHERE Condition LIMIT 0, 10
SELECT FOUND_ROWS()

(Fastest)

SELECT * FROM table WHERE Condition LIMIT 0, 10
SELECT COUNT(PrimaryKey) FROM table WHERE Condition

(Fast)

$result = mysql_query("SELECT * FROM table", $link);
$num_rows = mysql_num_rows($result);

(Very slow)

Joins

Use an INNER JOIN when you want the joining table to only have matching records that you specify in the join. Use LEFT JOIN when it doesn’t matter if the records contain matching records or not.

SELECT * FROM products 
INNER JOIN suppliers ON suppliers.SupplierID = products.SupplierID

Returns all products with a matching supplier.

SELECT * FROM products 
LEFT JOIN suppliers ON suppliers.SupplierID = products.SupplierID 
WHERE suppliers.SupplierID IS NULL

Returns all products without a matching supplier.

Best practice

1. Always use lowercase for table names. (If you use different OS’s this is a must)
2. Always prepend the table name to the field. E.g. ProductName, SupplierPostCode.
This makes multiple joins very easy.
3. Always create a primary id field with the name of the table followed by the id. e.g. ProductID
4. Index fields used for joins.
5. Use a separate logging table or transactions for logs of table updates, deletes etc.

The entry 'PHP Mysql tips' was posted on July 23rd, 2007 at 10:38 am and is filed under mysql, php. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

26 Responses to “PHP Mysql tips”

Richard@Home writes:
No. 1 — July 23rd, 2007 at 1:05 pm
Good tips, although I’m not too sure about Best Practice #2.

It seems to contradict #1 (lower case names). Shouldn’t field names be lower case too for the same reason?

And, what’s up with using product.name, supplier.post_code?

This becomes even more apparent when you use table aliases and join the same table twice in the same query. In your example, the sql looks like

SELECT RelatedProduct.ProductName, Product.ProductName FROM …

instead of:

SELECT RelatedProduct.Name, ProductName

(or better yet, SELECT related_product.name, product.name)

I suspect the ‘real’ reason for appending the table name is that in some (older) DBMS, field names were unique across the whole database. I’m struggling to think of any where this is still the case though.
Marc Gear writes:
No. 2 — July 23rd, 2007 at 1:18 pm
I have to say I disagree with several of your best practice points. I think nomenclature is a per-project thing, and there are arguments for different ways. My preference is to not have the table name prefixing fields in the table, because the table name is implied – I don’t see how it makes ‘joins very easy’ – if anything it makes them longer – with SQL like suppliers.SupplierID = product.SupplierID instead of suppliers.id = product.supplier_id. The same goes for your point on the primary key – why add the prefix? the membership of the table is implied. You should certianly prefix the foreign keys with the table name, otherwise you wont know which ‘id’ field it is you’re looking at 🙂

I would always recommend using lowercase field names too, with words separated by underscores. The reason for this is that mixing camelCase and lcase underscores in the same DB looks odd, and by keeping your names in lowercase you can easily identify field names in your SQL if you keep SQL keywords in upper case.
Gareth Heyes writes:
No. 3 — July 23rd, 2007 at 1:50 pm
@Richard@Home

Lowercase table names are important because on some systems the table name is case sensitive and can result in SQL errors when transporting your system to the new OS.

I have programmed many systems which need to be on different servers and demo laptops which can result in a huge headache.

The reason for pre-pending the table name to the fields is that instantly another developer will understand your database schema very easily.
Gareth Heyes writes:
No. 4 — July 23rd, 2007 at 1:55 pm
@Marc Gear

Yeah underscores would be better but I have a habit of not using them because of a current project I’m developing (which doesn’t use underscores). I find lowercase table fields don’t work for me because I find the name easier to remember if they are captitalised and I tend to do less mistakes.

Gareth Heyes writes:
No. 5 — July 23rd, 2007 at 2:01 pm

@Richard@Home

I may have given a bad example in the post, consider the following example:-

SELECT ProductDescription, SupplierDescription FROM products INNER JOIN suppliers ON suppliers.SupplierID = products.SupplierID

instead of

SELECT products.Description AS ProductDescription, suppliers.description AS SupplierDescription FROM products INNER JOIN suppliers ON suppliers.SupplierID = products.SupplierID

Tomek writes:
No. 6 — July 23rd, 2007 at 3:29 pm
on the recordsets – #2 might actually be faster than #1 with query cache being turned on.
http://drupal.org/node/38297#comment-147794
Gareth Heyes writes:
No. 7 — July 23rd, 2007 at 3:37 pm
@Tomek

Interesting I’ve always found the other to be faster but I’ll try it out in future because I do have a large database which performs some advanced sql queries.

Thanks for the link!
Whis writes:
No. 8 — July 23rd, 2007 at 8:11 pm
Yes, but you didn’t wrote one think…it works only with MyISAM 😛
Quote from manual “Full-text indexes can be used only with MyISAM tables, and can be created only for CHAR, VARCHAR, or TEXT columns.”
Gareth Heyes writes:
No. 9 — July 23rd, 2007 at 9:20 pm
@Whis

Thanks for pointing that out, I completely forgot to include it in my post 🙂
Chris writes:
No. 10 — July 24th, 2007 at 3:41 am
SELECT products.Description AS ProductDescription, suppliers.description AS SupplierDescription FROM products INNER JOIN suppliers ON suppliers.SupplierID = products.SupplierID

is going to get real long real fast if you include more tables.

Use table aliases:

SELECT
p.Description AS ProductDescription,
s.description AS SupplierDescription
FROM products p INNER JOIN suppliers s ON s.SupplierID = p.SupplierID
Solidus Obscura writes:
No. 11 — August 7th, 2007 at 6:53 pm
Three questions… Both in regards to a future (currently in development) site.

1. My site will make heavy use of pagination (it’s a custom implementation of a forum/priv. msg/etc.). Currently I do a “SELECT COUNT(*) FROM tbl(s) WHERE condition” and then, after running a function called “make_page_nav” which simultaneously creates the pages array and suitable LIMIT condition, I run the same query as above, this time selecting the appropriate rows within the condition.

Your “fast” Resultset queries seem to do the opposite, grab the data first, and then do a count (or FOUND_ROWS()). Is this implementation faster, by virtue of not scanning the table twice? (I had figured doing the count query would cache the query, by which the full select would feed from). But if No. 2 is faster with query caching than No. 1, would it even make a difference which order the count and datagrab queries are in?

2. My site will enable searching of multiple keywords or phrases against a single field. The way I have it now, supposing the search was: apples OR oranges would translate to SELECT * FROM tbl WHERE field LIKE ‘%apples%’ OR field LIKE ‘%oranges%’
Where if either term is found in the text field of a row, than that row is returned. Your first two “fast” contains matching queries seem to only match the beginning of the text field to the keyword (which to me seems why that would be faster than trying to find the keyword ANYWHERE in the field). I haven’t been exposed to the MATCH (`field`) AGAINST (‘*Keyword*’ IN BOOLEAN MODE) syntax, but what is the benefit of using that, and can it be stringed with OR’s?

3. I tend to use stringing by OR’s a lot as well, when matching a set of ID’s to a table.. For example, I store a set of moderators for a forum as |5|27|151| where the numbers are the UserId’s of the moderator. I have a function that converts that list to an array, and a function that converts the array to an ‘orchunk’, ex. (userid=5 OR userid=27 OR userid=151). I haven’t tried to use any “set” functions with mysql/php, rather storing sets as strings and using standardized (for my site) PHP functions to convert back and forth to arrays. Is this sort of tactic efficient?

Answers or links to suitable literature is much appreciated. I’d run performance tests, but since my site isn’t operational to the expected 40-100 concurrent users, I can’t trust the results. The worst case table size would be 100,000 rows (where the above LIKE searches could be run on).
Gareth Heyes writes:
No. 12 — August 7th, 2007 at 8:35 pm
Hi Solidus

Great questions!

1. My understanding of the FOUND_ROWS() call is that Mysql stores the recordcount in memory so it doesn’t run the query twice. In some instances the SELECT COUNT method can be faster, your best bet is to experiment between the two. I changed my database classes from one method to the other and noticed a good improvement so I stuck with FOUND_ROWS().
2. The MATCH AGAINST facility of mysql uses FULL TEXT indexing so you will notice a huge improvement on a large body of text for example.
3. In that instance I would use a JOIN to get the results. When I develop multiple user levels, I usually have a UserID or GroupID field which links together the particular table ,user or group.

LIKE searches will run fine on a recordset of 100,000 records if you design your tables correctly but once you start hitting bigger numbers I think it would be best to use FULL TEXT searching IMO.
Solidus Obscura writes:
No. 13 — August 8th, 2007 at 12:51 pm
So you set an index on the text field then, and run that sort of query against them? (Sorry if that sounds silly, I’m just accustomed to using numerical indexes). And, would that work on MEDIUMTEXT fields (According to a previous poster… â€œFull-text indexes can be used only with MyISAM tables, and can be created only for CHAR, VARCHAR, or TEXT columns.â€)? I generally try to use TEXT fields whenever possible, but there are one or two body fields in my db that require the maximum of a MEDIUMTEXT (more than 65535 chars).
Gareth Heyes writes:
No. 14 — August 8th, 2007 at 1:03 pm
Yes they can only be applied to CHAR, VARCHAR and TEXT fields as the poster mentioned and with MyISAM tables only. You will notice a huge speed improvement for lots of fields with TEXT columns.

With MEDIUMTEXT fields or bigger I suggest making a indexing program which will only gather keywords of the field and place it in a smaller field thus allowing you to search the keywords instead of the MEDIUMTEXT fields.

More information on FULL TEXT is available here:-
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Solidus Obscura writes:
No. 15 — August 8th, 2007 at 2:27 pm
Dunno if I’ll do an implementation of keywords, especially since most of the content will be random postings. More probable will be keywords on user-posted fanarts/fictions which make more sense to search by keyword instead of full text (which the fields will be of the MEDIUMTEXT variety, and you can’t search the content of an image… yet). I’ll keep the tip in mind (glad I happened upon this post).

The thing with the user levels/groups is that the set I search upon can be a random set. In my example, there can be a set of moderators stored as |1|2|3|, and these are set by an admin through a web-based control panel. They can be unrelated to a group, so to get the usernames of these mods, I’d do a SELECT userid,username FROM user WHERE (userid=1 OR userid = 2 OR userid = 3) ORDER BY username where the “or string” is generated by a function. I was curious if there was a more efficient way of doing this. It just dawned on me that it might be better to do a userid IN (1,2,3)…
Gareth Heyes writes:
No. 16 — August 8th, 2007 at 2:52 pm
Yeah IN is much faster, you can also use it with SUB QUERIES as well. For example SELECT * FROM table1 WHERE Field1 IN (SELECT Field2 FROM table2) worth knowing 😉
Solidus Obscura writes:
No. 17 — August 8th, 2007 at 3:01 pm
Yeah, I did sub-queries a lot in a previous job which is why it just occurred to me, since (1,2,3) is really just like a result set of a sub-query. I actually just did a performance test with a different site I’m working on and found little difference in load times. I’ll have to do more testing (preferably with more concurrent users).

Thanks for all the tips. Of course, I should and will performance test all of the advice, but it’s nice knowing things you didn’t before.
Gareth Heyes writes:
No. 18 — August 8th, 2007 at 3:45 pm
Yep performance test everything because it can vary between datasets a lot, I found some things a lot faster than others but you might not get the same results based on your data so keep that in mind. I’m glad the blog helped 🙂
asd writes:
No. 19 — September 2nd, 2007 at 4:55 pm
phpinfo();
Gareth Heyes writes:
No. 20 — September 2nd, 2007 at 5:14 pm
ooooo ya got me LOL

ENDSARCASM.
Adam writes:
No. 21 — November 13th, 2007 at 4:13 pm
Very nice, I wrote a similar article on TalkPHP.com and I was amazed to find that a lot of programmers who used MySQL were blissfully unaware of such statements – like if statements! How can they not know?
Tom writes:
No. 22 — February 29th, 2008 at 1:38 pm
nice tips , i’m using PHP for 5 years but didn’t know all your examples !

I was looking for a lowercase query and found somewhere else :

select * FROM hardware WHERE name regexp ‘$name’

REGEXP isn’t case sensitive. So, problem solved kinda…
Bijay Rungta writes:
No. 23 — July 19th, 2008 at 11:34 am
Ho do you propose to achieve #5

5. Use a separate logging table or transactions for logs of table updates, deletes etc.

Any recommendation for the database structure for a general table????
Gareth Heyes writes:
No. 24 — July 21st, 2008 at 8:17 am
@Bijay

I wrote a email management system log which had the following structure:-
ID
TableName
PrimaryKeyName
Action
LogDate
UserID
ClientID

This allows you to ask questions like How many times to user X update table X on whatever date.
Giulio writes:
No. 25 — June 6th, 2009 at 1:16 pm
Thanks, join is always a pain in the ***.
Check this tutorial on joins for some more info http://buup.mephir.com/read/11/sql-join-close-up-what-when-and-how.html
vishal writes:
No. 26 — November 28th, 2009 at 5:37 am
Very Good Practise… i have Done this Practially its works perfeclty

PHP Mysql tips

Word searching

Contains searching

Recordsets

Joins

Best practice

26 Responses to “PHP Mysql tips”

Inspiration

Recent Comments

July 2007
M	T	W	T	F	S	S
« Jun				Aug »
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31