Disk seeks are a major performance bottleneck. This problem becomes more apparent when the amount of data starts to grow so large that effective caching becomes impossible. For large databases where you access data more or less randomly, you can be sure that you need at least one disk seek to read and a couple of disk seeks to write things. To minimize this problem, use disks with low seek times.
Increase the number of available disk spindles (and thereby reduce the seek overhead) by either symlinking files to different disks or striping the disks:
Using symbolic links
This means that, for MyISAM
tables, you
symlink the index file and/or data file from their usual
location in the data directory to another disk (that may
also be striped). This makes both the seek and read times
better, assuming that the disk is not used for other
purposes as well. See Sección 7.6.1, “Utilizar enlaces simbólicos”.
Striping
Striping means that you have many disks and put the first block on the first disk, the second block on the second disk, and the Nth block on the (N mod number_of_disks) disk, and so on. This means if your normal data size is less than the stripe size (or perfectly aligned), you get much better performance. Striping is very dependent on the operating system and the stripe size, so benchmark your application with different stripe sizes. See Sección 7.1.5, “Usar pruebas de rendimiento (benchmarks) propios”.
The speed difference for striping is very dependent on the parameters. Depending on how you set the striping parameters and number of disks, you may get differences measured in orders of magnitude. You have to choose to optimize for random or sequential access.
For reliability you may want to use RAID 0+1 (striping plus mirroring), but in this case, you need 2*N drives to hold N drives of data. This is probably the best option if you have the money for it. However, you may also have to invest in some volume-management software to handle it efficiently.
A good option is to vary the RAID level according to how critical a type of data is. For example, store semi-important data that can be regenerated on a RAID 0 disk, but store really important data such as host information and logs on a RAID 0+1 or RAID N disk. RAID N can be a problem if you have many writes, due to the time required to update the parity bits.
On Linux, you can get much more performance by using
hdparm
to configure your disk's interface.
(Up to 100% under load is not uncommon.) The following
hdparm
options should be quite good for
MySQL, and probably for many other applications:
hdparm -m 16 -d 1
Note that performance and reliability when using this command
depend on your hardware, so we strongly suggest that you test
your system thoroughly after using hdparm
.
Please consult the hdparm
manual page for
more information. If hdparm
is not used
wisely, filesystem corruption may result, so back up
everything before experimenting!
You can also set the parameters for the filesystem that the database uses:
If you don't need to know when files were last accessed (which
is not really useful on a database server), you can mount your
filesystems with the -o noatime
option.
That skips updates to the last access time in inodes on the
filesystem, which avoids some disk seeks.
On many operating systems, you can set a filesystem to be
updated asynchronously by mounting it with the -o
async
option. If your computer is reasonably stable,
this should give you more performance without sacrificing too
much reliability. (This flag is on by default on Linux.)
Ésta es una traducción del manual de referencia de MySQL, que puede encontrarse en dev.mysql.com. El manual de referencia original de MySQL está escrito en inglés, y esta traducción no necesariamente está tan actualizada como la versión original. Para cualquier sugerencia sobre la traducción y para señalar errores de cualquier tipo, no dude en dirigirse a mysql-es@vespito.com.