Today I reached 109k Queries per Second. I was quite impressed by it.
Some background on the situation.
I developed some stored procedures to process some rather large tables we had in our database.
I managed to get the stored procedures to be very efficient and quick.
I then wanted to test it out and tried to overload the server to see how much it could take.
Normally, the server would do around 1k at best with these kinds of tasks. I have recently been able to tweak it to 20k QPS. But today, for some reason, the cache managed to get itself in the right position and produced this result.
The Server:
A 4+ year old Dell server, with SAS drives, 1 Quad-core CPU and 16Gbs of memory.
Database:
MySQL 5.0.48 - with MyISAM tables only
The Tasks:
Reference a 101 million row table (12+ Gbs) to fill in a column in three 8-9 million row tables (2-5Gbs).
Reference a 700k row table to fill in a 7 million row table.
So 4 tasks at the same time.
Yes, MySQL has transactions if you use InnoDB or NDB Cluster for example. Using these transactional storage engines, you'll have to commit (or roll back) your inserts, deletes or updates.
I've seen it a few times now with people being surprised that no data is going into the tables. It's not so a silly problem in the end. If you are used to the defaults in MySQL you don't have to commit anything since it is automatically done for you.
Take the Python Database Interfaces for MySQL. PEP-249 says that, by default, auto-commit should be turned off. You could turn it back on, but it's good practice to be explicit and commit in your code. Remember the Zen of Python!
Here is just a small example to show it. Uses MySQL Connector/Python, but it does work also with others:
import mysql.connector cnx = mysql.connector.connect(db='test') cur = cnx.cursor() cur.execute("""CREATE TABLE innodb_t1 ( id INT UNSIGNED NOT NULL, c1 VARCHAR(128), PRIMARY KEY (id) ) ENGINE=InnoDB""") ins = "INSERT INTO innodb_t1 (id,c1) VALUES (%s,%s)" cur.execute(ins, (1,'MySQL Support Team _is_ already the best',)) cnx.commit() cur.close() cnx.close()
I explored two interesting topics today while learning more about Postgres.
Partial page writes
PostgreSQL’s partial page write protection is configured by the following setting, which defaults to “on”:
full_page_writes (boolean)
When this parameter is on, the PostgreSQL server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint… Storing the full page image guarantees that the page can be correctly restored, but at a price in increasing the amount of data that must be written to WAL. (Because WAL replay always starts from a checkpoint, it is sufficient to do this during the first change of each page after a checkpoint. Therefore, one way to reduce the cost of full-page writes is to increase the checkpoint interval parameters.)
Trying to reduce the cost of full-page writes by increasing the checkpoint interval highlights a compromise. If you decrease the interval, then you’ll be writing full pages to the WAL quite often. This should in theory lead to surges in the number of bytes written to the WAL, immediately following each checkpoint. As pages are revisited over time for further changes, the number of bytes written should taper off gradually until the next checkpoint. Hopefully someone who knows more can confirm this. Does anyone graph the number of bytes written to their WAL? That would be a nice illustration to see how dramatic this surging is.
Decreasing the checkpoint interval seems a bit scary, and is bound to have its own costs, for all the usual reasons. A massive checkpoint once in a while should be really expensive, and would lead to a bad worst-case time for recovery. Does the new bgwriter implementation in 8.3 fix any of this? In theory it could, but I don’t know enough yet to say. I have heard conflicting opinions on this point. I have a lot more to read about it before I form my own opinion.
Storing full pages might not really be that expensive. It could bloat the WAL, but is the cost (in terms of time) really that high? InnoDB (in MySQL) protects against partial page writes with a double-write strategy: a region in the tablespace is called the doublewrite buffer. Page writes are first sent to the doublewrite buffer, then to their actual location in the data file. I don’t remember where, but I’ve seen benchmarks showing that this doesn’t hurt performance, even though it seems counter-intuitive. Modern disks can do a lot of sequential writes, and the way InnoDB writes its data makes a lot of things sequential. I doubt that putting full pages into the PostgreSQL WAL is forced to cost a lot, unless there is an implementation-specific aspect that makes it expensive.
The TODO has some items on the WAL, which look interesting — “Eliminate need to write full pages to WAL before page modification” and a couple more items. I need to understand PostgreSQL’s recovery process better before I know what these really mean.
Detecting data corruption
I was able to verify that the WAL entries have a checksum. It is a CRC32. This is in xlog.c.
However, as far as I can understand, the answer for detecting data corruption in normal data pages is “Postgres doesn’t do that.” I was told on the IRC channel that normal data pages don’t have checksums. I am not sure how to verify that, but if it’s true it seems like a weakness. I’ve seen hardware-induced corruption on InnoDB data many times, and it could sometimes only be detected by page checksums.
What happens when a page is corrupt? It probably depends on where the corruption is. If a few bytes of the user’s data is changed, then I assume you could just get different data out of the database than you inserted into it. But if non-user data is corrupted then do you get bizarre behavior, or do you get a crash or error? I need to learn more about PostgreSQL’s data file layout to understand this. Imagining (I haven’t verified this) that a page has a pointer to the next page, what happens if that pointer is flipped to refer to some other page, say, a page from a different table? If TABLE1 and TABLE2 have identical structures but different data, could SELECT * FROM TABLE1 suddenly start showing rows from TABLE2 partway through the results? Again I need to learn more about this.
This took me a bit by surprise at first. I don't find myself often agreeing with Matt. Most of what he tends to write/argue for is what I have referred to in the past as "crippleware". Canonical in recent time has taken to opening up their platform. I've been a strong advocate for Launchpad, it is a great service. I love that they opened it up in recent time. When it comes to infrastructure software on the size of LP, I don't believe that many others will ever install it. Slash, G-Forge, and the Livejournal software are examples of infrastructure software that approach the size or outweigh the LP codebase. They have rarely been successfully deployed by others. The advantage in the Launchpad software being open source is the potential for others to audit the code. I suspect that they will receive some patches, but I doubt that the number of patches will ever out pace what the conical staff itself creates.
This morning I got a number of worried pieces of email over Matt's new position at Canonical.
Do I find that I am worried about Assay joining Canonical?
Not really.
The job of the COO is too keep the company moving on a day to day basis. With his background at Alfresco, the COO role makes sense. Canonical has a lot of strong open source advocates so I wouldn't expect change in a direction that would create issue. Canonical's Ubuntu One is their longterm play. Service based revenue work hand in hand with open source go well together (...how many online services can you name that aren't based on open source?).
The COO position is one of the key positions that a company will hire for, yet, many smaller companies tend to pass over the creation of this position in lieu of having the CEO also fill this role. This is a real shame since you can often have a great CEO, who makes for a poor COO.
Ken Jacobs leaves Oracle
When Innodb was first acquired by Oracle there was a lot of shock and dismay within the MySQ Ecosystem. MySQL INC's reaction to the acquisition, which then rippled to the community, created a mistrust of Oracle. Ken Jacobs really changed that reaction in the community. There has been a number of times over the years that I found myself on the same side of the fence as Ken when it came to both leadership and technical vision about MySQL. I am sure Oracle has other competent executives to fill his shoes, but Ken has been a real asset to Oracle over the years. I am sad to see him leave the ecosystem, he played a very positive role in the community.
Oracle buying Innodb was never the killer move most envisioned at the time. It kick started engine development around MySQL, which was the only real innovation we saw for many years. Around the time of the acquisition all but one of the engineers who knew MySQL well, worked for MySQL. Having multiple companies working on engines re-invogorated outside development in the project. Without Oracle buying Innodb, the MySQL ecosystem would have never been forced into an innovators cycle again. PlanetMySQL Voting:
Vote UP /
Vote DOWN
We have been using tpcc-mysql benchmark for long time, and there many results published in our blog, but that's just single workload. That's why we are looking into different benchmarks, and one
of them is TPCE. Yasufumi made some efforts to make TPCE working with MySQL, and we are making it available for public consideration.
You can download it from our Lauchpad Percona-tools project, it's bzr branch lp:~percona-dev/perconatools/tpcemysql
Important DISCLAIMER:
Using this package you should agree with TPC-E License Agreement,
which in human words is:
You can't name results as "TPC Benchmark Results"
You can't compare results with results published on http://www.tpc.org/ and you can't pretend the results are compatible with published by TPC.
And we are not going to do anything from that, your primary goals is XtraDB/InnoDB performance research and/or compare with available Storage Engines for MySQL.
The workload in tpce is quite different from tpcc. Tpcc is write intensive, while tpce
is read oriented.
To give more details, there is stats for 10 seconds:
If you have multiple database servers with strange names, or if you have to hop over multiple machines to connect to any mysql database server, then you know what a pain it can be to administer such a setup. Thanks to some scripting, you can automate such tasks as follows:
Create an expect script:
/path/to/sshmysql.exp
#!/usr/bin/expect -f
#script by darren cassar
#mysqlpreacher.com
# you should change the word password in ’send — “password\r”‘ to your login password
# if you have the same password for each environment you could also script logging into mysql directly from the same expect script BUT that is not recommended.
Create a bash script:
/path/to/login.sh
#!/bin/bash
#script by darren cassar
#mysqlpreacher.com
The above setup can be used using any client/server OS: Linux, Solaris, MAC OS or Windows(running Cygwin)
NOTE: If you store the password in clear text inside the expect script, you should at least save the scripts inside an encrypted partition on your machine and make sure that folder is not shared or accessible by anyone. Another way of doing it would be to use either SSHKeys OR save the password inside a file and encrypt it using OpenSSL
One of the issues I have with the Free Software approach is that advocates have habit of throwing the baby out with the bathwater when discussing issues that they see as in any way negative to free software.
I was reminded of this while reading Bradley M. Kuhn’s criticism of Mark Shuttleworth’s reported views on copyright assignment.
Having read the original interview with Mark, and then Bradley’s response, it is pretty clear that the two have very different perspectives on copyright assignment: Mark is speaking from the perspective of a commercial business, Bradley form that of a non-profit foundation.
The two entities have very different reasons for enforcing copyright assignment policies, and Bradley is right to point out that a potential contributor should approach a copyright assignment policy from a commercial entity with a great degree of caution.
However, the ultimate reason for enforcing copyright assignment is about control. From a vendor’s perspective the desire for control is often to produce closed versions of the code. From the FSF’s perspective the desire for control is about keeping the code, and derivatives of it, open.
However, the fact that the FSF “promises to never proprietarize its versions of the software assigned to it”, does not support Bradley’s assertion that Mark “wants to confuse us about copyright assignment so we just start signing away our software”.
This claim is especially problematic given that Mark appeared (and it must be said we are reliant on the reporting of his statements to understand what he meant by them) to be attempting to reduce confusion around copyright assignments by, if possible, introducing some sort of standardization.
This is a suggestion that deserves more consideration. However, Bradley is so busy protecting the FSF from being maligned by Mark that he completely ignores the point raised by Mark - that copyright assignment policies are confusing, complex, and potentially problematic.
As the iTWire report demonstrates, the issue of copyright assignment is not just one that impacts contributions by individual developers (which is a common way of looking at it) but also of contributions from employees of Canonical to projects led by the likes of MySQL, Zope, Novell, Red Hat, Intel and others.
As previously noted, Oracle’s acquisition of Sun, and with it MySQL, has highlighted the issue of copyright control in encouraging/restricting community development in vendor-led development projects and providing acquirers with the potential to close an open source project.
Clearly, the issue is not as problematic for non-profit foundation-led projects, but the issue of copyright assignment needs more thoughtful assessment than a response that amounts to “non-profit=good, for-profit=bad”.
Ken Jacobs has been a fantastic advocate of the Oracle and MySQL user communites. I met Ken on the board of the Independent Oracle Users Group (IOUG). While Ken was the board liasson on the IOUG board, he was always supporting the Oracle user groups and made very important contributions throughout his time on the board and afterwards. After serving time on the board, Ken was still always PlanetMySQL Voting:
Vote UP /
Vote DOWN
What does it take for someone, fiercely loyal to a company to suddenly leave? Ken Jakobs, Oracle employee number 18, a man that sincerely loves the company, has resigned! The only reason I can think of is an extreme snub!
I must say, I am very disappointed. The prospect of Ken running MySQL was a light at the end of the tunnel for the community. Why? Because Ken is a MySQL insider! He knows the project, he knows the community.
As an engine developer I have come to know Ken well over the last 4 years. He lead the InnoDB team and is largely responsible for the improvements made to the engine since the Oracle acquisition. At the yearly Engine Summit he was always professional and constructive in his suggestions, with a deep technical knowledge of the subject. His track record shows that he has always kept his word with regard to Oracle's intensions with InnoDB, and I would trust him to do the same with MySQL.
Goodbye Ken. This is great loss for both the MySQL community and Oracle!
First, find out which MySQL you installed. This can be either the 32-bit or the 64-bit version. To make sure, find the mysqld (e.g. in /usr/local/mysql/bin) and do the following in a Terminal window:
If you see x86_64, you got 64-bit, otherwise 32-bit. If you see both, then you have a universal build. This is important for specifying the ARGSFLAG when building.
Download oursql from Launchpad and unpack it into some directory. Using the information from above, you'll have to do following for 64-bit platform (or universal build) in a Terminal window:
Following error will be reported when you don't specify the correct ARCHFLAGS:
ld: warning: in .../lib/libmysqlclient.dylib, file is not of required architecture
Tips:
When building failed, it is good to remove oursql, unpack it and try again.
If you don't want to compile anything, or run into more troubles, give MySQL Connector/Python a try (alpha releases). It's a pure Python implementation of the MySQL Client/Server protocol and doesn't need compiling or a MySQL installation.
I wanted to offer another InfiniDB load rate metric using the SSB lineorder fact table. In this case we are using a scale factor of 10,000 which translates to 60 Billion rows. As a point of reference, the recent Percona benchmark was at a scale factor of 1000 (6 billion rows) http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/ . The load rate per hour varied only slightly across the entire run, averaging about 4Read More... PlanetMySQL Voting:
Vote UP /
Vote DOWN
Well, for Matt Asay, I should start by congratulating you for the new job and nice title! (Also, we learn some intelligence from Matt's blog: apparently Canonical is already close to the size of MySQL AB at the time of the Sun acquisition.)
Usually we are told to "ignore the trolls" and all that. The blogosphere unfortunately seems to be full of commentators who like to have share their opinion - even while they are entirely clueless. Sometimes, like the comments on Slashdot, it is ok and considered part of the entertainment. Sometimes it is harmless, because nobody reads that blog. And sometimes, it is just unacceptable:
Here is one more procedure – (this time) for mass renaming of table. Adding and Removing table-name prefixes
A friend of mine was renaming 100+ tables by using replace methods in notepad.
I showed em a bit better editor: Editplus and then I thought of rescue rest of those who are still interested in some [...]
Related posts:
Matt Asay writes today in Oracle loses some MySQL mojo about Ken Jacobs leaving Oracle. For me, that’s a major bummer. Ken has been a long-time visitor of the MySQL Conference and that’s where I first met him: a friendly and knowledgeable person, on database technology in general but also about MySQL. When Innobase Oy got bought by Oracle, InnoDB got placed under Ken’s leadership and did pretty well there. We’d occasionally exchange emails, and I’ve always found him to be responsive and helpful.
I think it was kinda presumed by people that the technical part of MySQL at Oracle would also reside with Ken. Obviously now, that’s not going to be the case. What that means exactly, I don’t know as I am not familiar with the other person (Edward Screven). We’ve got to know Ken over the years, so it would’ve been nice to keep going with him. Ohwell.
Now we’ll see what Edward does with it all, and how he will interact with the MySQL community. And I wonder what new adventures Ken might be off to, if any?
We received some nice feedback on our care and feeding of InfiniDB blog entry, and we appreciate all of you who were kind enough to respond. We did fail, however, to communicate a few other intentions we have regarding how we plan to release and label the InfiniDB software so here’s some more thoughts from us on this important matter:
For new releases, we plan to follow the traditional alpha, beta, RC framework. Alpha means an upcoming release is not yet feature complete and moreRead More... PlanetMySQL Voting:
Vote UP /
Vote DOWN
Many times I’ve heard people advise on “best practices” for a MySQL database. This often includes routine maintenance, such as “you should run OPTIMIZE TABLE on all of your InnoDB tables once a week to defragment them for better performance.”
But this advice is unsubstantiated and could even be detrimental. Here are some of the obvious problems that I can think of:
The optimized table compacts the primary key (clustered index) to its default 15/16ths fill factor per page. But other indexes will be built in pseudo-random order and are likely to end up just as fragmented afterwards as before. Which indexes are more important for performance? Maybe the primary key is just a dummy value that’s not even used, and the secondary indexes are the ones that would benefit from compacting.
Suppose the primary key is the important one, and SELECT queries will perform more quickly if it’s defragmented. Why does it get fragmented? Because of changes to the table. Now these changes could suddenly slow down dramatically as they are forced to split pages at a much higher rate due to the more compact data layout.
Why do people make a blanket “you should defragment” statement without supporting it with hard facts? It sounds like something you’d hear from a naive Windows user who buys a $99 piece of software to make his PC “boot faster” or “fix his registry” or something. Maybe it ain’t broke and should not be fixed.
I believe we hear advice like this because there isn’t easy-to-get data that can tell us the truth. To make decisions about defragmenting tables responsibly, we need either performance data on that table (hard to get in most cases), or failing that, information about cost and frequency of page splits in general (not available from InnoDB at present). It would help to have these metrics, and I think it might not be very hard to add page-split instrumentation to InnoDB.
In case if you working on Windows environment for MySQL development (sometimes I use visual studio for easy debugging); and in case if you change the parser code (sql_yacc.yy) or if you are working directly from development branch (bzr launchpad), then the build breaks to generate the yacc files (sql_yacc.h and sql_yacc.cc) with an error bison: M4: Invalid argument as shown below:
1>------ Build started: Project: sql, Configuration: Debug Win32 ------1>Generating sql_yacc.h, sql_yacc.cc
2>------ Build started: Project: GenServerSource, Configuration: Debug Win32 ------2>Generating sql_yacc.h, sql_yacc.cc
1>bison: m4: Invalid argument
1>Project : error PRJ0019: A tool returned an error code from "Generating sql_yacc.h, sql_yacc.cc"1>sql - 1 error(s), 0 warning(s)2>bison: m4: Invalid argument
2>Project : error PRJ0019: A tool returned an error code from "Generating sql_yacc.h, sql_yacc.cc"2>GenServerSource - 1 error(s), 0 warning(s)
But if use source zip file for any particular release, then it won’t fail as the files (sql_yacc.cc and sql_yacc.h) are pre-built and copied to the distribution zip file.
But, again if you wanted to change the code or happen to save sql_yacc.yy, then it starts generating the files and build will break. It looks like lot of people are experincing the same problem to build parser code on Windows using any recent version of bison (not just MySQL code base).
Both bison.exe and m4.exe are in the path and they are the latest version; but still it fails..
It looks like the problem is with Windows version of bison to pick m4 executable even though m4 is in the path. For example, you can directy try to generate the files from sql directory using bison as…
c:\mysql-5.1\sql>bison-y-p MYSQL --defines=sql_yacc.h --output=sql_yacc.cc sql_yacc.yy
bison: m4: Invalid argument
The work around what I found is to copy m4.exe to sql directory directly, so that bison can pick from local working directory, then everything starts working as expected.
Kind of weired, but atleast there is a work around to change the parser code on Windows now; and it works great including Visual studio also starts building without any errors. But if you remove m4.exe from sql directory, then things starts to break immediately.
Oracle veteran and MySQL sympathizer Ken Jacobs has resigned from the database giant, calling into question Oracle's ability to deliver on its MySQL promises. PlanetMySQL Voting:
Vote UP /
Vote DOWN
One way to do a “Conditional Join” in MySQL is by using a “LEFT JOIN”. Create a “LEFT JOIN” for each condition and combine the results into one column using an “IF” statement by the “SELECT” expression. Here’s an example:
Suppose you have three tables:
questions: a table consisting of question ids, timestamps, and whether or [...] PlanetMySQL Voting:
Vote UP /
Vote DOWN
How To Set Up MySQL Database Replication With SSL Encryption On Ubuntu 9.10
This tutorial describes how to set up database replication in MySQL
using an SSL connection for encryption (to make it impossible for
hackers to sniff out passwords and data transferred between the master
and slave). MySQL replication allows you to have an exact copy of a
database from a master server on another server (slave), and all
updates to the database on the master server are immediately replicated
to the database on the slave server so that both databases are in sync.
This is not a backup policy because an accidentally issued DELETE
command will also be carried out on the slave; but replication can help
protect against hardware failures though.
The next presentation by Piotr Biel from Percona was on Multi-Master Manager for MySQL.
The introduction included a discussion of the popular MySQL HA solutions including:
MySQL Master-slave replication with failover
MMM managed bi-directional replication
Heartbeat/SAN
Heartbeat/DRBD
NDB Cluster
A key problem that was clarified in the talk is the discussion of Multi-Master and this IS NOT master-master. You only write to a single node. With MySQL is this critical because MySQL replication does not manage collision detection.
The MMM Cluster Elements are:
monitoring node
database nodes
And the Application Components are:
mon
agent
angel
MMM works with 3 layers.
Network Layer – uses a virtual IP address, related to servers, not a physical machine
Database Layer
Application Layer
MMM uses two roles for management with your application.
exclusive – also known as the writer
balanced – also known as the reader
There are 3 different statuses are used to indicate node state
proper operation
maintenance
fatal errors
The mmm_control is the tool used to manage the cluster including:
move roles
enable/disable individual nodes
view cluster status
configure failover
The Implementation challenges require the use of the following MySQL settings to minimize problems.
I have a hand-out ready in PDF. The slides are not usable without my chatter. It contains a few examples and links. Any comments, corrections, criticism.. are welcome!
Between following (from a distance) the talks at Fosdem and anticipating the ones at MySQL User Conference in April, I was reminded of 2 interesting MySQL talks that have had a deeper meaning to me than their original speakers probably intended. I thought today could be a good time to share these 2 stories that for me personally are filed in the "things I learned from MySQL AB and Sun" folder...
"If you can't solve the problem, try solving some other problem"
The slides for my presentation at FOSDEM 2010 are now available online at slideshare. In this presentation I describe a successful client implementation with the result of 10x performance improvements. My presentation covers monitoring, reviewing and analyzing SQL, the art of indexes, improving SQL, storage engines and caching.
The end result was a page load improvement from 700+ms load time to a a consistent 60ms.
The last few weeks I am still being asked what is going on with Oracle and MySQL and where is MySQL with it's software releases. So I am going to include some URLs to hopefully answer some of your questions regarding MySQL.
Oracle's press release on December 14, 2009 regarding MySQL.http://www.oracle.com/us/corporate/press/042364
Summary list MySQL software releases. A more detailed list can be PlanetMySQL Voting:
Vote UP /
Vote DOWN
Following the opening keynote “Dolphins, now and beyond”, Marc Delisle presented on “State of phpMyAdmin”.
phpMyAdmin is an DBA administration tool for MySQL available today in 57 different languages. This is found today in many distributions, LAMP stack products and also in cpanel. The product is found at http://phpmyadmin.net.
There are current two versions, the legacy 2.x version to support older php 3.x & 4.x, The current version 3.x is for PHP 5.2 or greater.
The current UI includes some new features including.
calendar input for date fields
meta data for mime types e.g images, which is great for showing the output as an image, otherwise blob data
Relational designer with the able to show and create foreign keys
The New features in 3.3 (currently in beta) include:
Replication support including configuring master/slave, start/stop slave.
Synchronization model showing structure and data differences between two servers and ability to sync.
New export to php array, xslx, mediawiki, new importing features including progress bar.
Changes tracking for changes on per instance or per table. Providing change report and export options.
I had the honor of opening the day at the MySQL developer room at FOSDEM 2010 where I had a chance to talk about the MySQL product and community, now and what’s happening moving forward.
For those that missed the talk, my slides are available online at Slideshare however slides never due justice to some of the jokes including:
What do you consider? the Blue Pill, or the Red Pill
Why think two dimensionally, how about the Green Pill
Emerging Breeds with performance enhancing modifications
The recently revived MySQL Meetup group had its first meeting. A big thanks to Venu Anuganti, our host and Schooner our sponsor. If anyone is interested in joining the fun please check out the Meetup.com site here.
During our first meetup we went over basic MySQL topics including configuration, performance tuning, locking, along with running MySQL in the cloud. All in all it was a well done presentation and I am looking forward to the next.
In MySQL Workbench, there’s the standard tokens (table, stable, dtable, column) that you can use to help customize your naming conventions of your columns, keys, etc. What you can also do is apply/pipe filters to these tokens. The filters are upper, lower & capitalize. You can see in the image above how they can be applied.
I am trying to debug a server with a high rate of Aborted connects reported in SHOW STATUS.This frequently occurs when the maximum number of concurrent connections, max_connections, has been exceeded. Alas, this server does not have too many connections, I have no problem creating a connection and there are connections for all accounts so the password table is probably valid. It is time to read the source code. The aborted_connects counter is incremented when:
the call to pthread_create in create_new_thread() fails. This creates a new thread to handle the new connection and failure is unlikely.
the call to my_thread_init() in handle_one_connection() fails. This is unlikely.
the call to THD::store_globals() in handle_one_connection() fails. This is unlikely.
the call to check_connection() in handle_one_connection() fails. This is likely.
At this point I assume the failure is in check_connection() but there are many reasons for that to fail:
return ER_BAD_HOST_ERR when vio_peer_addr() fails
return ER_OUT_OF_RESOURCES when memory allocation fails
return ER_HOST_IS_BLOCKED when there were too many connection errors for this client's host. The counters for this are reset when FLUSH HOSTS is run.
return ER_HOST_NOT_PRIVILEGED when the client's host is not allowed to connect
return ER_HANDSHAKE_ERROR when there is a network during authentication (read or write fails, not enough data)
return an error when check_user() fails
Running FLUSH HOSTS does not fix the problem and I don't think memory allocation is failing. check_user() calls mysql_change_db() and that can fail for several reasons:
return ER_NO_DB_ERROR because a db name was not specified
return ??? on memory allocation errors
return ER_WRONG_DB_NAME because the format of the db name is bad
return ER_DBACCESS_DENIED_ERROR because the user is not allowed to access the db
return ER_BAD_DB_ERROR because the filesystem directory for the db does not exist or cannot be accessed
Finally, there are several reasons for a failure in check_user() other than failures from mysql_change_db():
return ER_HANDSHAKE_ERROR when there is not enough data in the authentication packet or network writes fail
return ER_CON_COUNT_ERROR when there are too many concurrent connections (max_connections)
return ER_TOO_MANY_USER_CONNECTIONS when max_user_connections has been exceeded
return ER_USER_LIMIT_REACHED when there were too many connections, updates or queries in the last hour for this account
return ER_NOT_SUPPORTED_AUTH_MODE when the client uses the wrong a short hash to authenticate
return ER_ACCESS_DENIED_ERROR when authentication fails
That is a lot of potential errors. How do I find the problem? It would be easy if I had access to all client hosts and all client software logged all errors. That is rarely true for large deployments. It would be easier if I had SHOW USER_STATISTICS which had a few counters for authentication failures per user. From that I could determine whether the problem was limited to a known account. However, even that doesn't count some of the errors including attempts to use accounts that don't exist.
I want to know which errors occur most frequently. For that I need the command SHOW ERROR COUNTS. Alas, that command does not exist. Otherwise, I need to figure out how to count errors by parsing tcpdump output.
Time to assign the problem to someone else. PlanetMySQL Voting:
Vote UP /
Vote DOWN
Not quite pop quiz format, but if you enjoyed the ones I published some time ago (almost 2 years ago now… how time flies), you’ll probably be interested to know that (more…)
I’ve recently become supremely disappointed in the availability of Nagios checks for RAID cards. Too often, I see administrators rely on chance (or their hosting provider) to discover failed drives, a dying BBU, or a degrading capacity on their RAID cards. So I began work on check_raid (part of check_mysql_all) to provide a [...] PlanetMySQL Voting:
Vote UP /
Vote DOWN
*Matt Asay moves from Alfresco to Canonical
*GPL fade fuels heated discussion
*Apple’s iPad and its enterprise and open source impact
*Open source in data warehousing and storage
*Our perspective on Oracle’s plans for Sun open source
For anyone that missed the recent webinar on getting the best performance out of MySQL Cluster then the replay is now available from mysql.com.
Benefits of connection pooling
Are you experiencing current performance bottlenecks in your high availability applications ? Are you designing a new mission-critical application and want to know how best to structure your schema and index strategy for optimal performance? Interested in how to transform your SQL into faster, more efficient queries?
Then this free web presentation is for you! You will get expert insight and learn best practices to help you identify those areas of database and application design that will give you the greatest benefits for performance when using MySQL Cluster.
It discusses guidelines and best practices covering the following areas:
General Design Concepts and Guidelines
Schema Optimization
BLOB/Text vs VARBINARY/VARCHAR
Partition by Key
Index Selection and Tuning
Basic Query Tuning
MySQL Cluster Parameter Tuning Guidelines
Tools to accelerate application development and testing
Welcome, everyone, to the 177th edition of Log Buffer, the weekly review of database blogs. It was another week heavy with technical posts, so let’s waste no time, and get it all started with . . .
PostgreSQL
David Fetter shares his recipe for adding only new rows: “Let’s say you have a table and a data set, and would like to add only those rows in your data set that aren’t already in the table. There are hard ways, but here’s an easy one.”
Simon Riggs, the Database Explorer, offers his thoughts on parallel query in Postgres: “I’m disappointed we’ve not made much progress with parallel operations and partitioning in core Postgres in last few releases. Recent Greenplum results show we have much work to do in improving things.”
Contrariwise (or not), Iggy Fernandez offers five reasons not to attend a NoCOUG conference in 2010. “Most Oracle professionals will benefit a lot from attending a NoCOUG conference in 2010. However, the following categories will not benefit much: . . . Those Oracle professionals who believe that Oracle’s goal in buying Sun is to replace Oracle Database with MySQL. This is probably a very small group . . . ”
Uwe Hesse, the Oracle Instructor and his readers share a discussion on sharing READ ONLY Tablespaces between databases. Uwe says, “ . . . the question was raised, whether it is possible to use the same READ ONLY Tablespace in multiple Databases. At first glance, I thought that this should of course be possible, though the answers where somewhat discouraging. So I have done a quick test to prove it . . . ”
Krishna Manoharan of Performance Engineering and Capacity Planning steps up with an item on understanding CPU time as an Oracle Wait event. Krishna pursues the question, “[What] if the stats from the system show that CPU Utilization (% Util and Run queue) are well within thresholds and show plenty of available capacity, but Oracle continues to report CPU time as a Top 5 wait event?”
Charles Hooper grapples the question, Which Plan is Better? Charles writes, “A recent post appeared in the OTN forums that indirectly asked the question: which execution plan is better? . . . If you are attempting to conclude which plan is faster/better based on the estimates in [a] first plan and an altered plan with a hinted cardinality estimate, you might be setting yourself up for failure.”
“Seems like all I ever write about these days is SQL Profiles,” writes Kerry Osborne. “I do other stuff, honest! It just seems like getting Oracle to do what you want when you can’t touch the code is the closest thing to ‘Magic’ that DBAs get to do.” Here’s Kerry’s post on single-hint SQL profiles, inspired by a discussion with Jonathan Lewis.
And now here is Jonathan with a post on SQL Server. . . . Wait a minute–what the?!
SQL Server
Ahem. And now here is Jonathan Lewis with a post on SQL Server. Yes, Jonathan Lewis, famous Oracle guy. “A few days ago,” Jonathan writes, “I did a presentation on SQL Server. . . . The title was ‘What the Enterprise needs in an RDBMS’ . . . and the presentation was about whether or not you could find everything you needed in SQL Server 2008, where you’d have to look in the manuals, and supplementary questions you’d have to ask.”
Buck Woody also has been, as it were, treading the boards: “I give series of classes and presentations on Data Design. I say ‘data’ design instead of ‘database’ design because we should consider more than just the database. . . . Here are the links I use in that presentation. Although this isn’t a comprehensive list of Data Design topics, I’ll visit this topic from time to time so you may want to bookmark this page in your favorites[.]”
Buck Woody relays more nastiness in his post, Transparent Data Encryption and the Latest Data Breach: “Well, It’s happened again. Hundreds of thousands of private records were stolen from a database. This one, however, was different. No one stole any passwords, no one did any social engineering, nothing was captured in-line. No, this one was accomplished by stealing the actual hard drives themselves!”
One way to circumvent this–no hard drives! But wait–before you send them all to the kilns, Aaron Bertrand says, your laptop may be ready for SSDs, but are your SQL Servers? “I am not trying to be Debbie Downer here,” he explains,” … SSDs sound great . . . But right now, if you are looking at expanding or upgrading your I/O under SQL Server, I’d give the vendors some time to shake off these early jitters.”
Here’s Thomas LaRock, DBA Survivor, with a Name That Caption Contest. The deal is: “The person who provides the best caption will win a copy of my book,” DBA Survivor, “and I will figure out a way to incorporate your caption into the main page.”
On EXPLAIN EXTENDED, Quassnoi examines some details of join on overlapping date ranges in answering the question, “Is there any way to optimize the query for overlapping ranges in MySQL if both ranges are dynamic?”
Even though things have been a tad bit turbulent around here in the recent weeks, our engineers did not rest and churned out an impressive number of updates and new releases of the MySQL Server and related products.
Here's a quick summary of what we released this year so far (in chronological order):
Kudos to the developers! Source and binaries can be downloaded from the usual place. Enjoy! We welcome your feedback and bug reports. PlanetMySQL Voting:
Vote UP /
Vote DOWN
MySQL Cluster can be used as a general purpose transactional storage engine, but if you convert all your InnoDB tables to it and connect your application straight to it you may not see the performance you were hoping for. This is because MySQL Cluster was originally designed for real-time telecommunications applications (such as RADIUS servers). It has slowly been modified to become more general purpose and improvements are being made every day but there are still some performance considerations which go with this. In some cases tweaking your schema and/or queries can help performance dramatically, so I shall try and outline some of things to watch for here.
Indexes
The fastest type of lookup you can do in Cluster is a primary key equality lookup (ie. SELECT * FROM table WHERE pkey = 2). This is because the primary key is stored as a hash index as well as an optional ordered index. This hash index is used to partition the data between the data nodes. MySQL is smart enough to process the hash and go directly to the data node with the data.
When running a query which uses an ordered index (or unique hash index) the query is sent to the Transaction Coordinator in one node which then asks the Local Query Handlers in one or more nodes to return rows that match this query. The Transaction Coordinator is aware of the indexes so knows which nodes to ask to process the query. This in general is slightly slower and can actually perform worse for ordered indexes as more data nodes are added because more Local Query Handlers need to be contacted.
Finally if the query is a table scan the Transaction Coordinator must ask all the Local Query Handlers to search for the data. This is much slower.
Joins
Joins in MySQL Cluster do not currently perform well. Internally the second table must be queried to match every row returned by the first table, this can mean a lot of network traffic which can slow things down. There is work in progress to improve this by pushing the join condition down into the data nodes, this will give a massive performance increase when using joins. More information about this (called SPJ) can be seen in Jonas' blog.
BLOBs and TEXT
As I have already mentioned in this blog BLOBs (and TEXT) columns require a separate table to hold most of the BLOB data. This can cause performance problems and locking issues so if possible VARCHAR or VARBINARY should be used instead.
Transactions
MySQL Cluster currently only supports READ COMMITTED transaction isolation level so if you are used to REPEATABLE READ that you typically get from engines such as InnoDB you may want to check that this will not cause problems with your application.
In general we recommend transactions should be short, limited to just a few queries. MySQL Cluster can give temporary errors for a number of reasons, these do not happen often but will mean that the entire transaction will need to be started again. For many applications it will be very hard to replay a transaction which has been running for a whole hour. Also committing very large transactions can lead to GCP Stop.
I am in Brussels, waiting to attend FOSDEM 2010, one of the biggest open source gatherings in Europe, taking place this weekend in Brussels. On Sunday, there is a Developers Room for MySQL and Friends, with 14 talks from open source professionals coming from Europe and North America.
The novelty of this round of talks is that thy will be 20 minutes long, rather than 1 hour. This will force all presenters to be more cautious about their timing, and to concentrate their talks on the essential. Even the experienced ones, who have given the same talk several times, will have to make an effort to come to the point in less time. The idea cam from reading Scott Berkun's book, Confessions of a public speaker, where he argues successfully on the usefulness of short lectures. It's going to be interesting! The hosts of the Developers Room are Ronald Bradford and myself, but nothing of this could have happened without the excellent preparatory work done by Lenz Grimmer, who can't be here to enjoy the results of his organization, because he must stay home, waiting for his second child to come any moment. Thanks, Lenz, and good luck! In addition to the DevRoom, I will have a lightning talk on an unusual (for my public speaking record) topic: Blaming the unknown: a positive approach to technology. If you happen to be around, come see it. It's fun, I promise you, and also informative, or so I hope.
MariaDB 5.1.42, a new branch of the MySQL database which includes all major open source storage engines, myriad bug fixes, and many community patches, has been released. We are very proud to have made our first final release, and we encourage you to test it out and use it on your systems.
For an overview of what’s new in MariaDB 5.1.42, please check out the release notes.
For information on installing MariaDB 5.1.42 on new servers or upgrading to MariaDB 5.1.42 from previous releases, please check out the installation guide.
MariaDB is available in source and binary form for a variety of platforms and is available from the download pages.
We welcome and appreciate your feedback, bug reports, bug fixes, patches, and participation on our mailing list. Find out more about working with the community.
I can't read posts like this one without at least a little chuckle. Is the number one question you should be asking people how to start and stop MySQL on Windows? How does that really demonstrate how good someone is at their job when most people deploy on Linux[1]?
The original MySQL certification for 4.1 used to ask a whole bunch of trivia exactly like this - my favourite was a question where you had to say if a particular subqueries caused a syntax error. I don't know how this tests skill, since most subqueries shouldn't be used in production (hint: they are unoptimized in MySQL).
But MySQL changed its certification format: the new exams are Performanced-Based. This means to pass, you have to solve some of the problems you will be doing in real life. Hats off to Dave for leading this initiative.
Technical interviews need to change just as MySQL has. They should be organized in a way that doesn't intimidate the candidate who might know what they are doing, but can't always express it words when under pressure. Silly questions and 'gut feelings' about responses tend to favour the over confident.
I had a hand in designing the interview process at Percona. One of the steps candidates go through is a challenge to be completed on two running EC2 instances. I don't think it's flawless, but you tell me what is likely to be a better indication of talent:
Test #1: * What does tee command do in MySQL? * What is a serial data type in MySQL? * If I created a column with data type VARCHAR(3), what would I expect to see in MySQL table?
Test #2: * Log into these two servers (xxx is the master, yyy is the slave). * Tell me if you think there is a replication problem. * Resync the slave using the lowest impact method possible if there is. * Optimize these two queries while you are at it.
Test #2 isn't the actual test we use, but it's not far off.
I know a lot of DBAs that probably can't answer test #1 correctly. Does this mean they are bad at their job? That is one possibility, but the more likely is that this test is useless and should only come out on Pub Trivia night.
[1] On an unrelated note, the example answers for questions 3, 11, 12, 14, 15, 22 also demonstrate a misunderstanding ranging from small to just fundamentally wrong. PlanetMySQL Voting:
Vote UP /
Vote DOWN
INNODB has some irritating gotchas that makes disk space management hard. In 2002ish INNODB, added innodb_file_per_table to get around allot of these issues, but it does not fix everything.
If you are running innodb_file_per_table, you will notice in your database directory
db.opt - database characteristics file.
tablename.frm - the table structure.
tablename.ibd - the actual innodb table space file
Imagine that you have a table with 10 million rows and you delete say 5 million rows in multiple chunks around 400K chunks, because deletes are slow. Next, you notice that the table space file did not shrink. So what do you do? OPTIMIZE tablename, tada all the wasted space is reclaimed, but here is the PROBLEM the ibdata file grew! ibdata stores all of the UNDO LOGS thus GROWS due to the deletes and space is never reclaimed.
Ok, lets try coping the tablespace file to another directory and re-import the tablespace file after wiping the data dictionary (ibdata).
For instance /etc/init.d/mysql stop cd /var/lib/mysql/DB cp * /tmp/hold rm /var/lib/mysql/ib* cp /tmp/hold/* /var/lib/mysql/DB /etc/init.d/mysql start // create the ibdata file
ALTER TABLE tablename IMPORT TABLESPACE ERROR 1146 (42S02): Table 'DB.tablename' does not exist
Really, there is nothing that you can do to force innodb to purge the undo logs in the ibdata file without a full dump of the data to text and then reimport the data.
So here are the steps to shrink all table spaces and the ibdata file
mysqldump --all-databases (or use mk-parallel-dump) stop mysql rm -f /var/lib/mysql/ib* /var/lib/mysql/DB/* start mysql mysqlimport (or use mk-parallel-restore)
Takes a bunch of time but there is no other recourse. If you know of another way please share :)
by Alex Bolenok (Quassnoi) at 12:00 PM, 02/04/2010
Answering questions asked on the site.
Daniel asks:
I have a table which stores track titles in a VARCHAR(200) field. The field is indexed, but searching for titles beginning with a letter Z is noticeably slower than for those beginning with A, and the closer the initial letter is to Z, the slower is the query.
My understanding is that a full table scan occurs, but EXPLAIN shows that the index is used. Besides, the table is quite large but the query is still reasonably fast.
Could you please explain this behavior?
MySQL stores its indexes in B-Tree data structures.
The Wikipedia link above explains the structure quite well so I won’t repeat it. I’ll rather draw a picture similar to the one in the article:
This picture is quite self-explanatory. The records are sorted in a tree order, so if you are searching for a certain value, say, 11, you, starting from the first page, should find the link to follow. To do this, you need to find the pair of values less than and greater than 11. In this case, you should follow the link which is between 8 and 12. Then you search for the next pair, etc, until you find your value or reach the end and make sure that your value is not there.
Following the links is quite simple, but how does the engine search for the values within one page?
This depends on how you declared the table.
MyISAM supports two algorithms for storing the index keys in a page: packed keys and unpacked keys.
Unpacked keys are what you are seeing on the picture above: each page just stores the key values and the links to the pages down the tree. This is very simple.
Packed keys are designed to improve performance on character data. Many words and phrases, especially those that are close to each other, start with the same sequence of characters.
If you are going to store track names like:
The Man Who Sold The World
The Man Who Invented Himself
The Man Who Has Everything
, MyISAM can optimize it in terms of storage space and store them like this
(The Man Who) Sold The World
(×11) Invented Himself
(×11) Has Everything
This is called key compression: instead of repeating the key value for each record, MyISAM just stores the longest common prefix once and prepends the subsequent records that share it with its lengths. This makes the keys shorter and the index more compact.
However, this affects the index search time.
With an unpacked index, a binary search is applied to find the keys within each level page.
With a packed index, this won’t work: you need to know the value of two keys to compare them and not every record contains full information about the key.
So in case of a packed index, MySQL remembers the value of the prefix and iterates the records one by one. This is less efficient than a binary search, but due to the fact that much more records can fit on on page, this keeps the amount of page traversals to a minimum and overall efficiency increases.
But the records on one page still need to be compared and searched for. And with a linear search, the keys with less values tend to require less iterations than those with greater values.
Let’s look on the picture above again. The keys are searched left to right.
To search for key 1, we only need two comparisons: compare to 4, get the next page, compare to 1.
But to search for key 15, we need to compare to 4, 8, 12 then get to the next page and compare to 13, 14 and finally to 15.
This is 6 operations compared to 2 required to fetch the first key.
Now, let’s create the sample tables and see some figures:
CREATE TABLE filler (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;
CREATE TABLE t_source (
id INT NOT NULL PRIMARY KEY
) ENGINE=MyISAM;
CREATE TABLE t_packed (
id INT NOT NULL PRIMARY KEY,
name CHAR(6) NOT NULL,
KEY ix_packed_name (name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PACK_KEYS=1;
CREATE TABLE t_unpacked (
id INT NOT NULL PRIMARY KEY,
name CHAR(6) NOT NULL,
KEY ix_unpacked_name (name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PACK_KEYS=0;
DELIMITER $$
CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt <= cnt DO
INSERT
INTO filler
SELECT _cnt;
SET _cnt = _cnt + 1;
END WHILE;
END
$$
DELIMITER ;
START TRANSACTION;
CALL prc_filler(50000);
COMMIT;
INSERT
INTO t_source
SELECT id
FROM filler;
INSERT
INTO t_packed
SELECT id,
(
SELECT GROUP_CONCAT(CHAR(65 + FLOOR(RAND(20100204) * 26)) SEPARATOR '')
FROM (
SELECT NULL
UNION ALL
SELECT NULL
UNION ALL
SELECT NULL
UNION ALL
SELECT NULL
UNION ALL
SELECT NULL
UNION ALL
SELECT NULL
) q
)
FROM filler;
INSERT
INTO t_unpacked
SELECT *
FROM t_packed;
There are two MyISAM tables with randomly generated character sequences like this:
id
name
1
RTUPPH
2
RKZQJW
3
FKMEKL
4
BYOZFE
5
GSTRAF
6
YBNMSG
7
ZEZKCE
8
PMPNUJ
9
OQMMYH
10
OYAFDZ
49999
DZMKRC
50000
NHYWLR
The structure of the tables is identical, except that t_packed packs keys and t_unpacked does not.
Unpacked keys
SELECT COUNT(*)
FROM t_source s
LEFT JOIN
t_unpacked p
ON p.name = IF(s.id > -1, _UTF8'ABCDEF', NULL)
COUNT(*)
50000
1 row fetched in 0.0001s (0.7656s)
SELECT COUNT(*)
FROM t_source s
LEFT JOIN
t_unpacked p
ON p.name = IF(s.id > -1, _UTF8'GHIJKL', NULL)
COUNT(*)
50000
1 row fetched in 0.0001s (0.7331s)
SELECT COUNT(*)
FROM t_source s
LEFT JOIN
t_unpacked p
ON p.name = IF(s.id > -1, _UTF8'NOPQRS', NULL)
COUNT(*)
50000
1 row fetched in 0.0001s (0.7656s)
SELECT COUNT(*)
FROM t_source s
LEFT JOIN
t_unpacked p
ON p.name = IF(s.id > -1, _UTF8'ZYXWVU', NULL)
COUNT(*)
50000
1 row fetched in 0.0001s (0.7712s)
The queries against the strings beginning with A, G, N and Z take the same time. All queries have been run several times to populate the cache and the execution times are consistent.
The LEFT JOIN against a non-existent value was used in the query to avoid stopping on a found key and make the query traverse the index as much as possible. We also put a formal dependency on s.id here so that t_source is always leading in the join and no const optimizations are performed.
Packed keys
Let’s try the same queries on packed keys:
SELECT COUNT(*)
FROM t_source s
LEFT JOIN
t_packed p
ON p.name = IF(s.id > -1, _UTF8'ABCDEF', NULL)
COUNT(*)
50000
1 row fetched in 0.0001s (1.9531s)
SELECT COUNT(*)
FROM t_source s
LEFT JOIN
t_packed p
ON p.name = IF(s.id > -1, _UTF8'GHIJKL', NULL)
COUNT(*)
50000
1 row fetched in 0.0001s (2.3593s)
SELECT COUNT(*)
FROM t_source s
LEFT JOIN
t_packed p
ON p.name = IF(s.id > -1, _UTF8'NOPQRS', NULL)
COUNT(*)
50000
1 row fetched in 0.0002s (2.7812s)
SELECT COUNT(*)
FROM t_source s
LEFT JOIN
t_packed p
ON p.name = IF(s.id > -1, _UTF8'ZYXWVU', NULL)
COUNT(*)
50000
1 row fetched in 0.0002s (2.9375s)
The query against the string beginning with the letter Z takes more than 50% more time than the query against a string beginning with A.
Summary
Here’s a little summary table:
Search string
Unpacked key
Time, %
Packed key
Time, %
ABCDEF
0.7656
100.00
1.9531
100.00
GHIJKL
0.7331
95.75
2.3593
120.79
NOPQRS
0.7656
100.00
2.7812
142.39
ZYXWVU
0.7712
100.73
2.9375
150.40
We see that the value being searched for does not affect time to search the index with unpacked keys but seriously affects performance of the indexes with packed keys.
This increase is due to linear search used to locate the records within a single page.
Note that not any pair of records in a page (both with packed and unpacked keys) have a corresponding lower-level page containing intermediate values. There can be leaves and branches on the same depth in the tree.
This can lead to some artifacts: certain strings can be found (or proved absent) faster than the others. However, in average, all records have same depth. And in average, with packed indexes, the need for linear search increases the time required to find the keys with the greater values.
Hope that helps.
I’m always glad to answer the questions regarding database queries.
Installing Nginx With PHP5 And MySQL Support On Fedora 12
Nginx (pronounced
"engine x") is a free, open-source, high-performance HTTP server. Nginx
is known for its stability, rich feature set, simple configuration, and
low resource consumption. This tutorial shows how you can install Nginx
on a Fedora 12 server with PHP5 support (through FastCGI) and MySQL
support.
From MONyog 3.7 we have introduced 3 editions of the commercial version of MONyog. When we had first released MONyog 2 years back, it was already an almost complete tool for monitoring MySQL servers based on what the MySQL server exposes on SHOW statements. Since then we have added features that are not basic server monitoring features in the strict sense but additional or supplementing features. Most important the Query Analyzer was added around one year ago.
Multi-user authentication is a long standing request from customers belonging to large organizations – or just having the need for monitoring lots of MySQL servers. There may be multiple server administrators. There may be development/test servers that the developers of that organization should be able to monitor, but is may also be unwanted to give them access to see details from servers having delicate information (customer data, payroll data etc.). We have completely rewritten the MONyog authentication system. It is now possible to create multiple user profiles and give access for a specific user to a subset of servers available only. Additionally specific ‘admin’ functionalities (access to edit server settings, to KILL queries and to execute FLUSH STATUS) can be disabled for a user.
As MONyog is no longer just a tool that organizes what the server exposes on SHOW statements – but rather a ‘bundle’ of tools – we have realized that not all tools are useful to every MONyog customer. MONyog is no longer a “one-size-fits-all” application. MONyog caters to a very diverse range to customers. In some organizations MONyog may be used by a single person only, and in other organizations by dozens of people. Some users will not use the Query Analyzer at all (because they only run standard applications where they are not in control of schema and query design anyway). And so on. The Ultimate edition is for those users in particular that have full control over their database and their applications and who need multiuser authentication. The Enterprise edition provides the basic monitoring functionalities as well as the Query Analyzer – but not multi-user authentication. The Professional edition is an entry-level version with the (original) basic monitoring functionalities. Refer to the comparison sheet for full details.
We thought it is now the right time to have multiple editions of MONyog. This also means that people can start with the lower edition and gradually move to a higher edition if the need arises. This will also reduce the overall complexity and total cost of ownership for several customers.
We have also migrated all existing MONyog customers to MONyog Ultimate and Ultimate upgrade prices are kept moderate. So if you are already a MONyog customer your Total Cost of Ownership remains same while you continue to enjoy the powerful tools and features of Ultimate.
Release notes in traditional form can be viewed here.
Feel free to leave your feedback in the comments section.
You may have read Bernd’s recent post that explained how to try out some new beta functionality for MySQL Cluster and wondered what kind of use you could put the new ndb$info to. ndb$info uses tables/views to give real-time access to a whole host of information that helps you monitor and tune your MySQL Cluster deployment. This article gives one example, extending MySQL Enterprise Monitor to keep an eye on the amount of free memory on the data nodes and then raise an alarm when it starts to run low – even generating SNMP traps if that’s what you need.
One of the features of MySQL Enterprise Monitor is that you can define custom data collectors and that those data collectors can run SQL queries to get the data. The information retrieved by those custom data collectors can then be used with rules that the user defines through the MySQL Enterprise Monitor GUI to create warning/alarms.
In this example, I create two new data collectors and store the files in the “<MySQL Enterprise Monitor installation directory>/agent/share/mysql-proxy/items/” directory before starting up the MySQL Enterprise Monitor agents:
cluster_max_used.xml:
<?xml version="1.0" encoding="utf-8"?>
<classes>
<class>
<namespace>mysql</namespace>
<classname>cluster_max_used</classname>
<query><![CDATA[SELECT MAX(used) as Used FROM ndbinfo.memoryusage
WHERE DATA_MEMORY = 'DATA_MEMORY']]>
</query>
</class>
</classes>
cluster_min_avail.xml:
<?xml version="1.0" encoding="utf-8"?>
<classes>
<class>
<namespace>mysql</namespace>
<classname>cluster_min_avail</classname>
<query><![CDATA[SELECT MIN(max)as Max FROM ndbinfo.memoryusage
WHERE DATA_MEMORY = 'DATA_MEMORY']]>
</query>
</class>
</classes>
Fig. 1 Rule definition
In MySQL Enterprise Monitor, rules are grouped together into Advisors and so I create a new Advisor called “MySQL Cluster” and then create just one new rule within that Advisor group.
As shown in Fig. 1 the rule is called “Data Node Low Memory”. The “Variable Assignment” section is used to define 2 variables %used_mem% and %config_mem% which are populated from the Used and Max results from the 2 new data collectors. The “Expression” section is used to test “(Max - Used) < THRESHOLD” and then the values to be substituted for THRESHOLD are defined in the “Thresholds” section – indicating at what points the Info, Warning and Critical Alters should be raised.
There are then a number of optional sections that you can use to add useful information to the person investigating the alert.
Once the rule has been created, the next step is to schedule it and (if desired) tag that the alerts should also result in SNMP traps being raised. This is standard MySQL Enterprise Monitor practice and so it isn’t explained here except to point out that this rule is monitoring information from the data nodes but the rule has to be applied to a MySQL Server (MySQL Enterprise Monitor has no idea what a data node is) and so you need to schedule the rule against one or more arbitrary MySQL Server instances in the Cluster.
Fig. 2 Warning alert
To test the functionality, start adding more data to your MySQL Cluster until the Warning alert is triggered as shown in Fig. 2. As you can see, the optional information we included is shown – including values from Used and Max.
Fig. 3 Critical alarm
I then add more data to the database until the critical alert is raised and confirm that it’s displayed on the main monitoring panel of the MySQL Enterprise Monitor dashboard. Note that if you requested these alerts be included with the SNMP feed then SNMP traps will also be raised.
Please note that this example is intended to illustrate the mechanics of setting up monitoring on an arbitrary piece of data from ndbinfo and obviously in the real world you would want to monitor more than just the memory and even for the memory, you might want to use a more sophisticated rule.
Couldn't find a link for Community Server Download or something similar, I've been working with MySQL for over 5 yrs, I know that the downloads and most of the useful stuff for open source devs like me is on http://dev.mysql.com. The Developer Zone hosts the following for us
Community Server Downloads
Documentation (which keeps changing its links)
Articles
Forums
Beginner articles
MySQL forge
... and much more
I'm really surprised and ashamed of this MySQL's or Ora@#$'s move to hide such a widely used link.
I've done my part in adding rel='nofollow'attribute to http://mysql.com
).
However benchmarks by itself did not cover all cases I would want, so I was thinking about better scenario. The biggest problem is to get real big enough dataset, and I thank to Bradley C. Kuszmaul, he pointed me on Wikipedia statistics on access to Wikipedia pages, and thank to Domas, who made stats accessible. Link to the archives: http://dammit.lt/wikistats/archive/ or the original Domas's announcement .
Although the table does not have very much different information, I think it is good enough to represent cases you can face in Web application ( log processing, page visits, clickstream, etc).
I made some efforts to normalize data to have model in classic star schema and prepared queries that could be run on proposed dataset (John Sichi, lead of LucidDB helped me to draft some queries).
You can see details on our Percona Wikistat benchmark Wiki.
I have next goals with proposed benchmark:
Compare engines in OLAP queries for planning, predicting growth, analyzing access patterns to wiki pages, draw trends.
Compare engines in statistical queries for end users, which can be executed in real-time. I.e. How many times that or another page was accessed yesterday vs today.
Understand specific features and characteristic of each engine.
Compare throughput on simple queries (queries and scenario to be drafted yet)
Check ability to load data and serve queries at the same time ( availability during data load ) (queries and scenario to be drafted yet)
So in proposed schema I have four tables: pagestat (fact table), and pages, datesinfo, projects (dimensions tables).
Dimensions tables are supposed to be static and not changed, and we can change datasize
by varying amount of months loaded into fact table (so this is scale factor).
EER diagram
( made with MySQL Workbench )
In current dataset, which you can download from Amazon snapshot (name: “percona-wikistat”, ID:snap-a5f9bacc) we have:
Table pages: 724.550.811 rows. data size: 40476M
Table datesinfo: 9624 rows, one entry represents 1 hour
Table projects: 2025 rows
Table pagestats
Data for 2009-06: # 3.453.013.109 rows / size 68352M
Data for 2009-07: # 3.442.375.618 rows / size 68152M
So with two months of stats we have about 172GB of data with about 7 billion rows in fact table.
Percona is hiring. As part of our growth process, we introduced the role of the Shift Support Captain in 2009
(see the original announcement here) to
provide customers with a 24x7 technical contact person.
The Shift Support Team dispatches incoming emergencies, assigns new issues, handles or escalates incoming Nagios alerts from some customers, and in general makes sure things get handled
and customers get the attention they need.
You must be detail oriented, service oriented, tech savvy and as all Percona staff, able to work from home with little supervision. This is a growth position, so we're also looking for people who can become
consultants with time.
The basic requirements are:
Excellent written and spoken English.
Knowledge of MySQL and InnoDB.
Knowledge of Linux systems administration.
Knowledge of popular LAMP technologies such as Apache.
General programming and scripting abilities.
Proven success working in a distributed environment where
e-mail, IRC and voice calls are the only interaction with
clients, colleagues and managers on a daily basis.
Work is scheduled in 8-hour shifts, and some weekend shifts will be required.
Some of the benefits of working for Percona include good pay and benefits, working from home, with a team of experts who work on challenging projects everyday and who develop/contribute tomanycoolprojects
Some travel is required, at least for a yearly company meeting, and for some conferences and other events.
February and March as busy months for Community events. There's MySQL University, Fosdem, the Seattle MySQL Meetup & Confoo.ca. Here are the details:
February 4th - I'll be presenting a talk via MySQL University on Optimizing Queries with Explain. This talk will be about learning to read the output from MySQL's EXPLAIN command, and optimizing some example queries from the IMDB dataset. At 5.7G in InnoDB tables imported before any secondary key indexes IMDB is one of my new favorite example databases. It's about the right size on most desktop PCs that you can measure the difference between a query that's optimized and one that's not. All material comes courtesy of a new chapter I've been writing for our Developer's class on Schema & Indexing.
February 7th - My colleague Piotr will be presenting on Multi-Master Replication Manager for MySQL at the MySQL & Friends Dev Room. Besides Piotr's talk, there will be a number of MySQL enthusiasts in Brussels for FOSDEM. I recommend reading Lenz's announcement or the wiki page for more information.
February 16th - I'll be at the Seattle MySQL User Group giving a talk on Quick Wins with third party patches for MySQL. This is an old talk I've given previously, but I'll be updating it to demonstrate some of the really cool things you can do with XtraDB R-9.
February 23rd - The Boston MySQL Meetup is organizing a social dinner. More details to come.
A while back Friendfeed posted a blog post explaining how they changed from storing data in MySQL columns to serializing data and just storing it inside TEXT/BLOB columns. It seems that since then, the technique has gotten more popular with Ruby gems now around to do this for you automatically.
So when is it a good idea to use this technique?
If the application really is schema-less and has a lot of optional parameters that do not appear in every record, serializing the data in one column can be a better idea than having many extra columns that are NULL. The restriction on this, would be that searching on these columns now becomes more difficult[1]. A good example of this optional nature of data is user preferences - you only really need to store the settings that differ from the default values.
The second situation where I can see this technique making sense, is that when you update the text/blob, a large percentage of the data is actually modified. If you have to write back the full blob for 1 change, it is expensive.
Another potential pro for this technique is that ALTER TABLE commands are no longer required. Until 5.1-plugin simple operations like adding a secondary index on an InnoDB table require the whole table to be rebuilt (now just the index is built). I don't really buy this pro, since using master-slave switching I've always been able to emulate online index adds. Projects like MMM and Flipper make this even easier.
In Friendfeed's case, they also used the serialization to be able to compress the objects. From 5.1-plugin this is now available natively.
[1] Friendfeed solved this indexing problem by creating separate 'index' tables.
What are the downsides?
I would say that the first serious downside is write amplification. If you are constantly making small updates to one piece of data in a very large blob, the effort MySQL has to go to is greatly increased.
The second downside I would mention is that this pattern tends to force you to read/write larger amounts of data at once. We've all been trained to know that SELECT * is bad. This creates a pattern where SELECT * is not optional. In practical terms I would expect this increases the size of a working set since more data must be kept in the buffer pool.
The next obvious restriction is that there is a clear loss in functionality. You can no longer easily perform aggregation functions on the data (MIN, MAX, AVG). You are storing the data in a way that is very specific to one application, and you can't just point a BI tool at it to process.
It can become difficult to apply even the simplest constraints on the data such as character length, if an age must be a number, and if the age must be unsigned. MySQL doesn't go as far as having check constraints, but what it has is better than nothing.
Some smaller issues I have is that if you are using a standard serialization technique like JSON it's difficult to store pieces of data like numbers or IP addresses in their most efficient form, and that technically this design breaks 1NF.
What would I recommend?
Josh Berkus (of PgExperts) calls this serialization E-blob. Much like EAV, he criticizes this as one of 10 ways to wreck your database (slides). I tend to side with Josh. I'm optimistic that this design has its place in a smaller component of your application, but I'm weary every time I hear someone decide to switch to it exclusively.
Before making such decisions for yourself I would recommend reading Peter's KISS KISS KISS post.
We recently released XtraDB-9, and while we did not highlight it in announcement, the release-making feature is ability to save and restore InnoDB buffer pool.
The idea is not new and was originally developed by Jeremy Cole (sorry, I do not have the link on hands) some time ago, and now we implemented it in XtraDB.
Why would we need to save and restore content of buffer pool ?
There are several reasons.
First, it's not rate on modern servers to have 32GB+ of RAM, with allocated InnoDB buffer_pool 26GB or more. When you do restart of server, it may take long time to populate cache with useful data before you can bring it back to serve production load. It's not rare to see
maintenance cycle takes two or more hours, mainly because the slave need to catchup with master and to warm cache.
In case with the server crash, it is even worse, you need to wait possible long time on InnoDB
recovery (we have the patch for that too, in that post you can see InnoDB recovery took 1h to accomplish) and after that warm caches.
Second, it is useful for some HA schemas, like DRBD, when, in case of failover, you need to start passive instance on cold.
To save buffer pool you execute select * from information_schema.XTRADB_ADMIN_COMMAND /*!XTRA_LRU_DUMP*/;
and to restore select * from information_schema.XTRADB_ADMIN_COMMAND /*!XTRA_LRU_RESTORE*/;
it will create/read file ib_lru_dump from your database directory.
You may want to sort ib_lru_dump in order of pages in tablespaces, so RESTORE will be
performed in most sequential way. The small python script lrusort.py to sort ib_lru_dump is available
in our Launchpad branch lp:~percona-dev/percona-xtradb/extensions-1.0.6
I made small tpcc benchmark to show effect with restored buffer_pool (the condition of
benchmarks are the same as in my runs on fast storages, and I
used RAID10 to store InnoDB files).
First run (xtradb cold) I made just after restart and ran it for 1h.
After that I saved buffer_pool, restarted mysqld, restored buffer_pool ( it took about 4 min
to load 26GB worth of data), and run (xtradb warm) tpcc again.
Here is graphical results (results in New Transactions Per 10 sec, more is better):
As you see in the cold run it took 1500-1800 sec to enter into stable mode, while
it warm run it happened almost from start. There was some period of unstable results, but it
did not affect ability to serve load.
You are welcome to test it, it is available in XtraDB-9 release and also in MariaDB 5.1.41-RC.
So you get MySQL or other applications using too much memory on the box or OS behaving funny and using more memory for cache and pushing application to the swap. This causes swapping and causes performance problems. This much is obvious. But how bad is it ? Should you count it same as normal Disk IO as the box is having or is it worse than that ?
Swapping is going to impact your performance more than just normal IO and here are 3 reasons why. If you know more please let me know, for my taste these 3 are bad enough so I have not been looking for more.
Cache in the Swap File will multiply IO compared to just having less cache. What happens when page in cache is replaced which is swapped out itself ? First you have to find space to swap in the page (we're speaking about memory pressure right?) which means swapping out some page. This would normally happen in background but still it has to be done. When the pages is swapped in which is second IO and finally you get page being cached read from the hard drive to the cache. This gives us 3 IOs instead of one. Nice
Flushing dirty pages or even discarding the page will cause extra IO slowing things down.
Skewing up all Algorithms The database internals algorithms are tuned for things being in memory and if they start dealing with data which is on disk they just often stop working with any reasonable level of efficiently - when database deals with on disk data it often uses different set of algorithms which are optimized to limit number of IOs or make them more sequential. Most of these were designed before SSD era. For example Insert Buffer in MySQL makes a special effort to avoid (delay) IO. If it happens to go to swap file it will more than defeat its purpose. Number of background threads are designed with assumptions they can check page statuses in buffer pool very efficiently which again stops working as soon as page accesses cause disk IO.
Escalated Locking/Latching If breaking of internal operations is not bad enough on its own lets see what swapping does for concurrent (multi CPU, multi client) processing. Database Locks/Latches are typically designed to be held for as short time as possible. The less portion of execution time thread spends holding exclusive locks the better system will scale. It is a big no-no to hold any critical locks while you're doing disk IO as IOs take a long time. When system is swapping all of these gets messed up - when database is thinking it is taking the locks for few instructions only it can be a long while while IO completes - if this is critical lock it is possible to see everything in the system waiting on this IO, even transactions which work with data which is not in the swap file.
The bottom line: You should configure system so no swapping activity is going on during normal operations. The swap file itself may be justified - if you have some unexpected memory consumption spike you may prefer to see slowdown instead of MySQL being killed because of out of memory but do react on them promptly and do not treat such situation as normal.
I appreciate opportunity Jos van Dongen from Tholis Consulting gave me. He granted me access to servers with 8 attached Intel X-25M 80GB MLC cards. The cards attached to 2 Adaptec 5805 raid controllers, with 4 cards per controller.
The cost of setup is 8 x 260$ (X-25M) + 2x500$ (Adaptec 5805) = ~3000$.
Available space varies in depends on raid setup from 300GB to 600GB.
The logical comparison is to compare results with FusionIO 320GB MLC card, so I will copy results from FusionIO 320GB MLC benchmarks.
For benchmarks I used sysbench fileio benchmark.
All raw results are available on Percona benchmarks wiki, there I will highlight most interesting points.
Couple words on tested setups. We used two configurations:
In first runs (software) each card connected as individual card to Adaptec, so in OS we see 8 individual cards. Card configured in software RAIDs: RAID0, RAID10, RAID5, RAID50
In second round (marked as hardware) each Adaptec is configured as hardware RAID0 over 4 cards, so in OS we see 2 devices. Devices connected in software RAIDs: RAID0 and RAID10 with different OS schedules over each device.
I should highlight I do not see usage in production for RAID0, as in my opinion SSD cards
are not reached enough level of reliability yet (see comments to post FusionIO 320GB MLC benchmarks), however I put results here to show theoretical maximal results.
So let's start with random reads:
I'd say SSDs show comparable results with FusionIO on 16+ threads, however on 4-8 threads difference is significant. On SSD you can get 160MB/sec for 4 threads and 260MB/sec for 8 threads.
Random writes:
There couple things to note (beside SSD is doing much worse than FusionIO).
1. Something is wrong with scaling random writes in this setup. It is point for research,
I think there some serialization in Linux software raid or Linux scheduler or on Adaptec hardware level.
2. Cards connected in hardware raids show worse results than card connected as single devices (you can see results on http://www.percona.com/docs/wiki/benchmark:ssd:start, in summary table, randwr rows)
3. For cards connected in hardware raids, DEADLINE performs much worse than CFQ.
Sequential reads:
- For sequential reads you can get pretty decent results from SSD (230MB/sec on 4+ threads)
- Cards connected in hardware raid is doing much better
- DEADLINE outperforms CFQ here (which is different from random writes)
- Software raid0 performed pretty bad, so I chosen to show hardware-raid0 results
Sequential writes:
I'd say sequential writes is hard task for both SSDs and FusionIO, it does not scale well.
You may want to look into another options if your load requires sequential writes (e.g. like I
put InnoDB transactional logs on rotation based drives instead of SSD in my InnoDB on FusionIO benchmarks).
So in summary I can say
With SSD drives you can get decent results for random and sequential reads.
I think it is competitive vs FusionIO if we look into price/performance analysis (remember FusionIO is twice as expensive)
Random writes did not work for me as expected, and this is point for investigations
RAID5, as expected, is only competitive for reads, but not writes
Complexity of having 8 SSD drives may be significant. You may want to
look into different options: connect them in software or hardware raids, what scheduler to
pick, etc. I suggest to run sysbench fileio or similar (i.e. iozone) to check if you get performance you expect
In my opinion maintaining 8 SSD cards per server is much more hassle than
deal with single FusionIO card, however there is important point that with SSD you may
"hot"remove and insert cards, while for FusionIO, which put into PCI-E bus, you need
to shutdown server to replace it
I generally thought about MySQL replication as being quite low overhead on Master, depending on number of Slaves. What kind of load extra Slave causes ? Well it just gets a copy of binary log streamed to it. All slaves typically get few last events in binary log so it is in cash. In most cases having some 10 slaves are relatively low overhead not impacting master performance a lot. I however ran into the case when performance was significantly impacted by going from 2 to 4 slave on the box. Why ?
The reason was Master was having a lot of very small updates going on - over 15.000 transactions a second. Each time event is logged to the binary log all 4 threads feeding binary logs to the slave were to be woken up to send the update notification. With 4 slaves connected this makes 60.000 of threads being woken up sending some 60.000 packets (there may be some buffering taking place on TCP side merging multiple sends but still)
I guess this scenario is just not really caught any developer attention yet as it should be rather easy to optimize. Same as network cards are designed to throttle numbers of interrupts they get and process several packets at the time we could make replication threads to be woken up in the batches. For example we could tune the system to wake up the thread feeding slave no more often than 1000 times a second and each wake up even would send multiple events to the slave. It should be possible to make this number tunable as more rare wakeups are less overhead but they are also can impact replication latency a bit. It is also possible to get some auto detection in place timing how long it really takes all threads to send their data to the slave. If you have large amount of slaves the delay from the event executed on the master to last thread sends packet to the slave can be significant.
What does this case teach me in general ? To always look for data rather and question your assumptions. If something is unlikely to be the bottleneck it does not mean it is not
After a nice long vacation, it's time to unveil our destinations for public classes in 2010. We are now offering a course for Developers as well as DBAs. The dates are:
Do you run a meetup group in one of the cities listed? Get in touch! I would be happy to come along and give a presentation, or just answer your MySQL questions over a dinner or beer. You can reach me via @percona or email.
Finland-based Aito Technologies, developer of a customer experience analytics product suite, today announced its plans to employ Sun Microsystems’ MySQL Embedded Database Server software to successfully analyze up to billions of phone calls, text messages, and mobile data sessions on a daily basis.
Recently I moved from windows servers 2003 32 bit and IIS6 to windows servers 2008 64 bit and IIS7.
My experience was rather painful, windows 2008 seems like a wild animal after windows 2003 experience.
2008 has lots networking and scalability issues and it took me quiet a while and many tweaks and hacks
to achieve reliability and good performace plus scalability and solve many issues.
If you own a windows 2008 server under some significant load from the web issues described in this post maybe relevant for you. Maybe it may save you some time in desperate searching around the net for solutions...
Before we continue to windows 2008 and IIS7 tips - some relevant info about IIS7.
IIS7 breaking changes
IIS7 introduced new integrated pipeline model where asp.net pipeline is integrated into IIS
which has extensibility and performance benefits. This comes together with some breaking changes to configuration and asp.net.
IIS7 has static compression turned on by default and dynamic compression turned off by default.
My first tip is to disable static compression (website - compression in IIS7 manager).
It can be the problem when you have static XML files on you server for example.
XML files become unreadable by XML parsers with IIS7 static compression. Browsers and other 3rd parties will not be able to read them anymore because XML is malformed (possible bug?).
Change max concurrent requests per CPU setting
By default IIS7 has a limit of handling 12 concurrent requests per CPU and will queue requests above this limit. If you have some significant web load and many AJAX style requests to your server - this setting maybe very restrictive and it is hard to find out the root of the problem when you server performance is suddenly degraded.
See
this post
to get the picture how it can cause performance issues on your server.
Some relevant info about asp.net thread usage on IIS7 here:
Thomas Marquardt advice is to change this default limit. Recommended settings:
"All of this may be a little confusing, but for nearly everyone,
my recommendation is that for ASP.NET 2.0 you should use the same settings as the defaults in ASP.NET v4.0; that is, set maxConcurrentRequestsPerCPU = "5000" and maxConcurrentThreadsPerCPU="0".
This is done by adding DWORD MaxConcurrentRequestsPerCPU to the registry under
called MaxConcurrentRequestsPerCPU (DWORD). This key doesn't exist by default. Or/and in aspnet.config section which overrides registry setting (also doesn't exist by default)
aspnet.config is here on windows 64bit:
You need to add the following section under "configuration" section (here I used default values) -
don't forget change maxConcurrentRequestsPerCPU to 5000.
Disable offload network enhancements if you have network issues
Different unexplained network issue on your windows 2008 server
could be related to TCP Chimney and related "networking enhancements" in windows 2008 and certain
hardware vendors when hardware doesn't play well with these enhancements.
If you have networking problems similar to these below - try to disable these features.
Set Connection "close" request header explicitly posting to AJAX web services
When moving from IIS6 to IIS7 integrated mode you may experience issues similar to below
(POST request time-outs and request aborted errors) and all related to AJAX POST requests to IIS7 while AJAX GET requests don't have any problem.
I experienced the same issue. Some HTTP POST requests got aborted in IIS7 pipeline and never reach server code. It can happen one per 1000 requests. However it may be still a significant number and if you have many users it can degrade performance for other requests.
In HTTPERR logs you can notice Timer_EntityBody errors related. POST requests usually arrive split in 2 segments: headers and following request body while GET requests arrive in one segment. On network monitor you can notice that server received POST request headers but request body never arrives to the server and after some timeout IIS aborts such request.
The solution is to set Connection "close" request header on the caller explicitly.
Somehow IIS7 doesn't treat it very well, while IIS6 is able to handle missing Connection "close" header.
Go figure.
Sun Microsystems today announced Cashpoint, the Austrian betting and gaming company, has selected Sun's MySQL™ database to manage its customer, Web, and results data. In order to protect its systems' high-availability, Cashpoint uses both a MySQL Enterprise™ Unlimited subscription and the MySQL Cluster database product. Employing an enterprise-wide MySQL Enterprise subscription along with MySQL Cluster contributes to considerable cost savings for the company.
Sun Microsystems, Inc. today announced the details of the Sun Tech Days worldwide developer conference for 2009-10 at http://developers.sun.com/events/techdays/index.jsp. Sun Tech Days is a multi-city world tour designed to showcase how the developer community can leverage Sun technologies, services and products to drive the next generation of industry innovation. Since 1998, the Sun Tech Days worldwide conference has brought together hundreds of thousands of developers and students to learn about technology from industry experts.
In a procedural language without the use of threads (or Inter Process Communication via forks), to execute I/O requests they are done one after another. Synchronous Queries produce at best a Big-O of N such that N is an element of I/O communication (queries) and N equals the number of queries needed to achieve the requested dataset. With IPC or threads we can speed up common O(N) problems to reduce the N with parallelism, its still functionally a O(N) yet from a single instance point of view N is much less because threads (IPC) takes that Serial computing component and executes the code in parallel. To better explain what I am talking about lets look at some PHP code:
foreach($friends as $friend){ $data[] = getMySQLData(“SELECT * FROM AccountData WHERE userid = $friend); }
The Primary key for the AccountData table is userid. Assuming that you have 5000 friends, the query has to be executed 5000 times. We can reduce the O(N) and change it to a O(nlogn) (Binary Tree - doesn't take into account other factors) by switching the query to
$data = getMYSQLData(“SELECT * FROM AccountData WHERE userid IN (….)”);
We just sped up the retrieval of the data significantly, yet we just introduced a bottle neck on the datalayer. Our architecture requires that the data is located in a single location.
What if AccountData’s data is spread across many servers federated by userid? This means that userid belongs to a server, so the server contains a shard of the AccountData’s Data.
Now we are back to a O(N) where each query needs to be executed on the corresponding shard. The logical next step is to group queries per shard and run across them all. For instance
$multiShardIDs = $genericShard->getMultipleShardIDs($objIds); foreach ($multiShardIDs as $shardID => $shardUserIDs) { if (stripos($orgQuery, " WHERE ") !== false){ $query = $orgQuery." AND {$column} IN (".implode(',', $shardUserIDs).") "; } else{ $query = $orgQuery." WHERE {$column} IN (".implode(',', $shardUserIDs).") "; }
$shard_to_sql[$shardID] = $query;
.... more stuff ....
Yet this is still a O(N) its just that N is smaller. Each query is still executed serially. Let’s look at some stats of synchronous queries of SELECT 1; This query is executed across 35 shards and the timings are from PHP point of view.
Field
End Value
Start Value
Delta
ru_oublock
0
0
0
ru_inblock
0
0
0
ru_msgsnd
0
0
0
ru_msgrcv
0
0
0
ru_maxrss
0
0
0
ru_ixrss
0
0
0
ru_idrss
0
0
0
ru_minflt
9872
9865
7
ru_majflt
0
0
0
ru_nsignals
0
0
0
ru_nvcsw
11344
11114
230
ru_nivcsw
977
968
9
ru_nswap
0
0
0
ru_utime.tv_usec
865054
849053
16001
ru_utime.tv_sec
16
16
0
ru_stime.tv_usec
556097
552097
4000
ru_stime.tv_sec
1
1
0
Total Execution Time
0.18323707580566
As you can see, to execute this from PHP it took 100 ms, 100s pages reclaimed and 200s voluntary context switches to query 35 servers.
Now let’s look at Asynchronous execution of SELECT 1; // the query generation is from PHP yet the execution is performed on a server that executes the query in parallel
Field
End Value
Start Value
Delta
ru_oublock
0
0
0
ru_inblock
0
0
0
ru_msgsnd
0
0
0
ru_msgrcv
0
0
0
ru_maxrss
0
0
0
ru_ixrss
0
0
0
ru_idrss
0
0
0
ru_minflt
9131
9121
10
ru_majflt
0
0
0
ru_nsignals
0
0
0
ru_nvcsw
3891
3889
2
ru_nivcsw
290
290
0
ru_nswap
0
0
0
ru_utime.tv_usec
596287
596287
0
ru_utime.tv_sec
4
4
0
ru_stime.tv_usec
460028
460028
0
ru_stime.tv_sec
0
0
0
Total Execution Time
0.019363880157471
As you can see from the table above executing the query asynchronously produced results with less context switching, less pages reclaimed and almost 10 times execution improvement over the synchronous query counterpart. How is the asynchronous query executed? Lets take a look at the figure below.
So A user comes through the firewall / load balancer with a HTTP Request to the www pool that runs PHP. PHP now makes a CURL request to the Async Shard Servers (through a LB same LB different PORT). The HTTP Request to the Async Shard Server contains the SQL we wish to execute. The Async Shard Servers has a thread per shard and executes the request in parallel. The results are merged and sent to the calling CURL process via JSON. The returned JSON is then converted into a PHP object. This is a typical three-tier environment.
When having to query multiple servers using an Asynchronous Tier is dramatically faster; in fact its as fast as the slowest server. This is the main sticking point of why asynchronous queries are faster then synchronous queries (in this context) since the total execution time for serial queries is the SUM of all the query execution.
The current version of the server is used for Friend Query execution across the datalayer. Its been solid for a few months now, and I'm currently getting permission to release it as an Open Source Product. The features this server contains:
Lightweight
CPU bounded
Scales Linearly
A Timer Thread to keep the database config up to date in memory and fetching the config from PHP so if PHP changes connections to the shards so does Java
Uses Java-6 Executor Service
Merges the result set prior to sending it to the calling process
Sun Microsystems, Inc. today announced that CASPUR, a non-profit, consortium of Italian universities focused on scientific supercomputing and innovative technologies, has subscribed to Sun's MySQL Enterprise™ database service.
E.ON ES, a subsidiary within E.ON Group, today announced that it has purchased a multi-year MySQL Enterprise™ database subscription from Sun Microsystems to help power its advanced work order management system, used for the service and maintenance of Sweden's electricity distribution grid.
I am not a fan of upgrading mySQL unless I need to. I am of the mind if it is not broke don't fix it, but when I do upgrade I follow these general steps.
If I have run into a mySQL bug, I look to see if that bug is fixed by searching the mySQL bug database.
If I've notice a performance bottleneck, I look to see if the performance bottleneck has been fixed by searching the same database.
I will NOT upgrade to the latest and greatest version of mySQL (5.4) I stay within my branch (5.0).
These are my three general motivations that drive my upgrade decisions. Anytime I upgrade I also make a list of things that might affect my environment for the stuff I use.
Ignore all the NDB changes... I don't use it and that's the majority of fixes. This is also, why I do not use it.
List the changes that will affect the production environment
Deploy the version that I picked on a few servers running my original config
Do data corruption tests (make sure my checksum scripts return the same data)
Verify that the problem I'm trying to fix is fixed
Deploy to more boxes
Let the new server bake for a period of no less than a week
Deploy everyplace
So now, I'm upgrading from 5.0.56 to 5.0.86. What I'm trying to fix is mysql memory overhead at high levels of ram.
For instance, I have a slew of 48GB boxes. I set the bufferpool to 40GB; the OS uses 1 GB of memory (roughly) leaving an overhead of 7GB for the system cache and various spikes of sort buffers. Over a period, I see that mySQL will consume and hold onto 47GB of memory for an unknown reason even with some tight my.cnf settings. (I'm certain they are tight since I know what each buffer does). Therefore, testing some later versions of mySQL we found that these later versions do not grow past the settings defined yet performs the same.
Next, since I decided that upgrading is a good solution, now it’s time to list all the changes that fixes things.
Therefore, overall, upgrading should give me a boost in performance. My own internal testing sees some tighter memory usage, even though this is not fixed explicitly, the product has matured overall so I can account for the reduction in memory to that.
Request time is proportional to server load. If the application response time is big so will be the server's load. To reduce server load, reduce the wait time in the application's response time to serve the request. Below are some steps that I took to remove 1.8 ms overhead on every request to my web server farm.
Tools Needed: vi strace top
Use vi to look at your include path in php.init. Next use top to find which apache process is consuming the most cpu resources. Use strace -p [TOP HTTPD PROCESS] -T (-T is for deltas).
In my example the include path is /usr/share/pear:/usr/lib64/pear:.:/var/www/html/httdocs/
lstat("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000033> lstat("/usr/share", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000034> lstat("/usr/share/pear", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000034> lstat("/usr/share/pear/ams", 0x7fbfff1690) = -1 ENOENT (No such file or directory) <0.000033> open("/usr/share/pear/ams/include/FreqCapInfo.php", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000036> lstat("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000037> lstat("/usr/lib64", {st_mode=S_IFDIR|0755, st_size=65536, ...}) = 0 <0.000034> lstat("/usr/lib64/php", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000034> lstat("/usr/lib64/php/pear", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000038> lstat("/usr/lib64/php/pear/ams", 0x7fbfff1690) = -1 ENOENT (No such file or directory) <0.000037> open("/usr/lib64/php/pear/ams/include/FreqCapInfo.php", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000038> open("/var/www/html/ams/include/FreqCapInfo.php", O_RDONLY) = 24 <0.000043> fstat(24, {st_mode=S_IFREG|0775, st_size=6707, ...}) = 0 <0.000031> stat("./ams/include/FreqCapInfo.php", 0x7fbfff4778) = -1 ENOENT (No such file or directory) <0.000037> stat("/usr/share/pear/ams/include/FreqCapInfo.php", 0x7fbfff4778) = -1 ENOENT (No such file or directory) <0.000038> stat("/usr/lib64/php/pear/ams/include/FreqCapInfo.php", 0x7fbfff4778) = -1 ENOENT (No such file or directory) <0.000036> stat("/var/www/html/ams/include/FreqCapInfo.php", {st_mode=S_IFREG|0775, st_size=6707, ...}) = 0 <0.000040> close(24) = 0 <0.000033> mlock(0x552b876be0, 24) = 0 <0.000092> mlock(0x552b8df910, 10624) = 0 <0.000038> munlock(0x552b876be0, 24) = 0 <0.000036> munlock(0x552b8df910, 10624) = 0 <0.000032> mlock(0x552b876be0, 24) = 0 <0.001707> mlock(0x552b8df910, 10624) = 0 <0.000009> munlock(0x552b876be0, 24) = 0 <0.000007> munlock(0x552b8df910, 10624) = 0 <0.000007>
Looking at the strace, 15 unneeded system calls are made on every request, each request roughly takes 30 micro seconds, for a total of a few ms wasted on every request. False positives are adding overhead to ever requests since the include path is not optimized. A Bloom Filter in Shared Memory would be perfect for for this part of PHP-but that's besides the point.
So, change your include path .:/var/www/html/httdocs/:/usr/share/pear:/usr/lib64/pear
For my example I changed the include path to the above. In my environment we don't do many PEAR loads, so it makes sense to use our directories 1st.
By doing this 15 erroneous system calls have been removed. Note: If you use the php feature __autoload make sure to protect your calls with file exist or you may be doing a require once on a file that is in a different directory which is a PHP fatal Error.
UPDATED: Some more goodies to reduce load on PHP boxes:
For PHP 5.2 there is a nice new feature that you can play with
realpath_cache_size. This is a directive native to PHP by default it's set to 16K, this means that PHP expects there to be very few files, but big files. In most environments I have been in this is not the case. The case is there are a lot of files, a lot of directories thus 16K is not enough.
I've tested a few values but 128K seems to be my sweet spot.
As announced previously backup facebook inbox functionality is now available on Fosimo. It took some time until the app was whitelisted on Facebook in terms of reading users inbox permission but now it should be operational. If you already have Fosimo 1.0.8 it should work.
Download here. Hurry up because maybe it is the first app on Facebook which is capable of doing this. For now your inbox is saved in XML format.
What is Nagios? Nagios IMHO is the best Open Source monitoring system out there. It supports hosts checks, a level to determine on a host level if a box is considered "up". It supports service check, a level to determine if a particular service such as mySQL is up. It has features to log all events to a flat file or to a DB. It can notify you when a service is in a warning state, error state or unknown state.
For the purpose of this article, I am going to talk about handling events such as a clearing up swap.
First, let us look at some configuration of Nagios. We are going to define a command, then service acting on that command. Let us assume that the nagios install is in /usr/local/nagios.
Therefore, in /usr/local/nagios/ a few configuration files are key: - /usr/local/nagios/etc/objects/commands.cfg - the command file where the checks are defined - /usr/local/nagios/etc/hosts/*/hosts.cfg - the services file where the checks are defined for execution based on other directives in this file.
This says that check_local_swap executes check_swap with a warning threshold of $ARG1 and a critical threshold or $ARG2
Next when defining a service for a host
define service{ use generic-service; Name of service template to use host_name dbfacebook34b ; hostname service_description SYS:Swap ; what shows up in alerts is_volatile 0 check_period 24x7 ; threshold when to check (all the time) max_check_attempts 4 ; threshold to check before marking state event_handler handle-swap ; handle an event (another command) normal_check_interval 5 ; in seconds retry_check_interval 1 ; only try once before reporting the state contact_groups itops ; contact group to send notifications to notification_options w,u,c,r ; need to look this up for all defs notification_interval 600 ; retry sending notifs every 8 mins notification_period 24x7 ; keep sending them check_command check_nrpe!check_local_swap!80%!55% ; execute the event handler and warn like hell }
Lots of goodies as you can see. Let us look at the event handler
This means execute this script whenever any event for swap occurs (I decided to make this simple and not put a threshold on this).
What does handle_swap.pl do - well it’s a perl script that looks at free memory and if only a few 100K of swap is in use, swapoff -a; swapon -a;
In this case, it is a bit safe to do this. Why do this? Why not just turn of swap. I have talked in depth about this subject-but for a minor recap. Linux needs swap else, kswapd will freak out. Swap in DB's is bad so I clean it up automatically since O_DIRECT on my SAN is not an option.
Why not just run a cron job? Nagios keeps a log, I like to review what is happening from a central location, and nagios is freaking COOL.
Sun Microsystems, Inc. today announced an update to its MySQL Enterprise™ subscription, designed to assist corporate users improve the speed and uptime of their MySQL™ database applications. Thirty-day trial subscriptions of the new MySQL Enterprise Fall 2009 Release are now available free-of-charge from http://www.mysql.com/trials.
Fosimo 1.0.8 is available. Fosimo is a desktop app for Facebook which helps you control your Facebook account without browsing through Facebook.
Facebook has rolled out several important features recently including new inbox functionality today and new authentication scheme for desktop applications a couple of weeks ago so there are new features and adjustments.
New in Fosimo 1.0.8
Improved permissions configuration and authentication (supports recently introoduced "Publish Stream" and "Read Inbox" permission).
Improved photo upload functionality.
Improved set status (removed "is" verb). This is an old bug on Fosimo but finally fixed.
Many fixes and enhancements.
Backup facebook inbox: now you can backup your inbox on facebook to your hard drive with 1 click. Please note: Facebook is rolling out new inbox functionality now so it may take a couple of days until this feature is fully available for everyone.
I didn't try Windows 7 yet. But seems that Windows 7 inherited user access control feature from Windows Vista as is. It maybe useful info for people running MSIs on Windows 7.
I learned it because lifehacker linked yesterday to my 2-year-old post about MSI Error 2869 on Windows Vista where I explained the error and how to deal with it (lifehacker had the same MSI problem on Windows 7). In short — the error is caused by user access control feature and you need to run MSI from command line as administrator as a workaround (or patch the MSI). I was a bit surprised yesterday while reading that 2-year-old post since it seems very techy and it took me a while to fully understand it :). Good coverage after all...
For the first time ever I took a two-week vacation. How can a person who has 100s of database servers, 100s of web servers, and a system that supports 100K tps across 20 TB of data is able to take a vacation? Easy, I have a great team that is very competent in managing the platform by following our cookbook routines and guidelines for new application interaction with the databases.
Where did I go? I went to NYC-stayed in Tribeca and only ate Pizza, Hotdogs, White Castle and Hala Food from vendors that are nearly on every street corner. I also lost 6 pounds! After three months of half-ass dieting to prepare for my Pizza diet, who would have known that just going for it was the key to success! How did I do it? Well, first I walked to Little Italy, bought 2-4 slices, ate them, and then walked around the city. That is it. As a Database Dude, I do not do much walking except for when I am too tired to run on the treadmill for 20 mins. I was averaging around 10 miles a day for a 12-hour period walking around NYC taking pictures, enjoying the sites, eating, drinking, and then repeat. I even walked from Canal Street all the way to the Staten Island Ferry just because. Man it was great. After NYC, I went to Puerto Rico for the second time. My family and I had a freaking awesome time. We went to the only US National Park that is a Rain Forest. Went to Vieques to the Bio Luminous Bay, where the single cell organisms light up when kinetic energy is applied to them. Went to Culebra, to Flamenco Beach – which is off the hook: much better than any other beach I have been to. Drove around the entire Main Puerto Rican Island stopping off in Rincon and eating every 3-4 hours. I am fully rested and feel like a 20 year old again. So, expect a lot more posts. I think the order will be:
Migrate 20 TB of data into a new Format without downtime. How to make Friend Queries work with a database. Throughput increase from Asynchronous Queries.
Back in February 2009 we crossed 1 million users mark across all our applications on all platforms combined (link).
In July (or even in June) we reached 2 million. Right now the number is roughly 2243K total users (with more than 1 million users on Friendster alone).
Right now we have 13 applications with more than 50K users each.
Fosimo.TR 0.5.6 is available for download. Fosimo.TR is a light and simple Twitter client for windows which helps you keep track of your Twitter account without browsing through Twitter.
New in Fosimo.TR 0.5.6
My @responses: now one can view all @responses on twitter in real time (or "mentions" as twitter calls it). It is one of the items of the main menu.
Quick tweet: double-click on the tray icon will open a tweet window for a quick tweet.
When moving the mouse over animating icon it slows down. Users complained that small boxes move too fast and we finally address this issue slowing down each box after the mouse was over it.
We have more on our TODO list. We will roll up new versions regularly.
Fosimo.TR 0.5.5 is available for download. Fosimo.TR is a Twitter client for windows which helps you keep track of your Twitter account without browsing through Twitter.
New in Fosimo.TR 0.5.5: friend list paging. Twitter has a limitation of 100 rows of data per API call therefore for those who have more than 100 friends on twitter (meaning people you follow) we implemented paging. I have 280+ for example.
Also we restored a click-on-animation functionality. Now one can click on animation and get their friend's details in a single click. This way people are able to retweet/reply in just 2 clicks.
A couple of small fixes included in this version as well.
Sun Microsystems, Inc. today announced that TweetMeme, the innovative online application that tracks the popularity of links sent via Twitter, has subscribed to Sun's MySQL Enterprise database offering in order to keep up with the microblogging service's explosive growth.
Created in 2008 by pioneering Web 2.0 startup Fav.or.it, TweetMeme gives the rapidly expanding Twitter community a means of easily seeing and sorting the most popular links on Twitter. TweetMeme also provides a realtime search facility of fully expanded, qualified and indexed links so that users can get answers to queries about very recent events or news.
So, when you connect on the fly to a database your subject to a variety of issues, like when the db is not available and when the db does not have a route.
One of the main reasons why a dev may want to connect on the fly is because they have too many front ends to hold a persistent connection on the backends. Since mySQL does not use libevent, holding open threads to mySQL is much more costly. Threads == Memory.
But, that's here nor there. The main purpose of this post is to talk about how to recover from failed connections that block apache threads.
Common Failures: No route to Host Flapping NIC Locked Tables Recovering from a Crash more of the same.
My Environment: I have a bunch of webservers (200+) that all have 300 possible threads (60000 possible connections to a single DB) behind a load balancer that uses the LB least connections protocol to distribute load across all 200+ webservers.
Since PHP is used in a stateless mode (no guarentee that the same user will hit the same server), I have to have some way of telling all other apache procs for said box that a server is down. I use stateless on the fly connections, so each apache proc will test the connection.
Here in lies my most common problem. If a box in the backend dies, all apache threads will block for a predefined time-out.
In /etc/php.ini (it could be in /etc/php.d/mysql.ini) I set
mysql.connect_timeout = 5 // the default is 60
In my common_db class when connecting to a database, the connect routine returns a database handle object
if (PEAR::isError($dbh) && $delta_to_connect >= 5){
Now PRIOR to calling the database connect code I check to see if the IP is up.
$status = apc_fetch($ip); if ($status[0] === 'DEAD'){ return false; }
... do connect ...
But, there could be a variety of issues that can call false positives, like network flaps, someone blocking the db for some time etc. So, I'll allow one request .1% of the time bypass the status check to try again.
But, if your app can't connect to the DB aren't you down anyway?
Yes, although I don't have to restart all the httpds. Also each DB has a redundant pair to when returning false, the app code will try the other set of servers.
I have a variety of methods to deal with these scenarios but this is the quickest to implement.
Infoworld has an article "The fight over open source 'leeches', about the failure of enterprises to contribute the changes they make to open source software back upstream to the project.
"When it comes to open source communities, individuals are much better citizens than institutions. The enlightened self-interest that causes individuals to send back bug fixes, contribute ideas for new features, and write documentation is much harder to find in institutions," Dan Woods, CTO of Evolved Media, wrote in Forbes earlier this year.
I have a very old article in my german language blog about Open Source Software and companies, and the infoworld article triggered me now to translate it:
Many companies do use open source, and that is a good thing. Most of that software is well suited to the task, and usually it is much easier to understand, more flexible and also easier to debug than their closed source equivalents. But open source requires that enterprises adjust their process with regard to changes and deployment of software in order to be successful open source users.
The fact that you can change open source software lures many companies into making changes to the software. But making the changes is just a small part of the work, and the most inexpensive one.
There are two PRIMARY directions to sort data in SQL: Ascending (ASC) and Descending DESC. When these two sort definitions are put together in a single statement a filesort is produced.
Why do we want to avoid filesorts?
Filesorts are bad. 1st they tickle a thread based buffer called sort_buffer_size. Additionally filesorts reads the data twice, unless max_length_for_sort_data limit is reached and as a result the Filesort runs slower to reduce disk I/O. If you want filesorts to run faster at the expense of the disk increase the default max_length_for_sort_data. You can read the filesort algorithm here.
So, here is an example
CREATE TABLE `ABCD` ( `A` int(10) unsigned NOT NULL default '0', `B` int(10) unsigned NOT NULL default '0', `C` int(10) unsigned NOT NULL default '0', `D` int(10) unsigned NOT NULL default '0', PRIMARY KEY (`a`,`b`,`c`,`d`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> explain SELECT * FROM ABCD WHERE a=1 AND b=1 ORDER BY c DESC, d ASC\G
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: ABCD type: ref possible_keys: PRIMARY key: PRIMARY key_len: 8 ref: const,const rows: 2 Extra: Using where; Using index; Using filesort 1 row in set (0.00 sec)
Notice the filesort? So how does one get around this filesort?
Well
Let's define some roles for columns C and D. C is the parent while D is the child.
We want all the latest parents (C) We want all the oldest children (D)
We require pagination of all the PARENTS (show 10 parents per page) so Queries like this is PRODUCED
SELECT * FROM ABCD WHERE A=? AND B=? ORDER BY C DESC explain SELECT * FROM ABCD WHERE a=1 AND b=1 ORDER BY c DESC LIMIT 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ABCD type: ref possible_keys: PRIMARY key: PRIMARY key_len: 8 ref: const,const rows: 2 Extra: Using where; Using index 1 row in set (0.00 sec)
Now
FOREACH($C_parent as $i => $c_id) {
$C_parent[$i] = SELECT SQL_CALC_FOUND_ROWS * FROM ABCD WHERE A=? AND B=? AND C=$c_id ORDER BY D ASC LIMIT 1;
}
So, we changed 1 query into 11 queries (10 parents per page) to make the page load happen faster, by getting rid of the filesort.
What 11 queries is faster then 1? Yes, for this case it is. The reason is because filesorts are SLOOOOW, they chew up a lot of limited resources and they should be avoided. I've see filesorts take close to 50-60% of the query time.
This is the translation of an article from my german language blog. It is not a literal translation, but has been amended and changed a bit to take more recent information into account.
It started out as a discussion within the german language MySQL group in USENET. There the eternal question came up why phpMyAdmin gets no love at all from the helpers and regulars in that group. My answer was:
phpMyAdmin (PMA) like many other GUI tools for MySQL has a number of limitations. For a web tool such as PMA these come from its operating principles and can hardly be changed. But let's start at the beginning:
In MYSQL the connection is a special context or scope for many things. At least the following things are part of the connection scope:
Transactions. A disconnect implies a ROLLBACK.
Locks. Transactions generate locks with writing statements or SELECT for UPDATE. Table locks are generated by LOCK TABLES. Disconnect releases the locks.
The number returned by LAST_INSERT_ID() is cached within the connection context. It is unavailable after a disconnect.
Tables created by CREATE TEMPORARY TABLE. They are deleted on disconnect.
Prepared Statements and the parsed bytecode of stored routines are kept per connection, even if that is the wrong solution from a performance POV.
@-variables. SET @bla = 10 or SELECT @bla := count(*) FROM cookie; are defining variables within the context of a connection. They are lost on disconnect.
SESSION-parameters. SET SESSION mysiam_sort_buffer_size = 1024*1024*64 or SET @@session.myisam_sort_buffer_size = 1024*1024*64 are setting configuration parameters with session scope. They are lost on disconnect.
Replication specific SET-commands such as SET TIMESTAMP or SET LAST_INSERT_ID can affect the behavior of functions such as SELECT NOW() or SELECT LAST_INSERT_ID().
I am calling all of that connection scoped state.
A client that is disconnecting in an uncontrolled way or that does not report a disconnect properly is defective in the sense that all functionality dependent on connection scoped state is unavailable. The opposite case also exists - a client that reuses existing connections may have connections that have an unclean state, which may or may not be a security risk.
Now Sun has been bought, and not by IBM or Cisco, but by Oracle. In the aftermath everbody is asking themselves - what happens to MySQL?
Well, firstly MySQL and InnoDB are now part of the same company. Oracle has been maintaining InnoDB pretty well in the past, and that can only improve.
Will Oracle let MySQL die and try to push their on products into the market? Hardly so. Why should Oracle do that, and if so, using what products?
Sun Microsystems, Inc. today announced MySQL™ 5.4, a new version of the world's most popular open source database, designed to deliver significant performance and scalability improvements to MySQL applications. A preview version of MySQL 5.4 is available now for download at http://www.mysql.com/5.4.
Today's announcement was made at the seventh annual MySQL Conference & Expo being held this week at the Santa Clara Convention Center. With more than 2,000 attendees, it is the world's largest event for open source database developers, DBAs, vendors and corporate IT managers.
Sun Microsystems and Oracle Corporation announced today they have entered into a definitive agreement under which Oracle will acquire Sun common stock for $9.50 per share in cash. The transaction is valued at approximately $7.4 billion, or $5.6 billion net of Sun's cash and debt.
Want to learn how open source software helped Barack Obama get elected president? Don't miss the closing keynote at the MySQL Conference & Expo 2009, "Database We Can Believe In: Stories from the Front Lines (and Server Rooms) of Barack Obama's Online Presidential Campaign."
Co-presented by Sun Microsystems and O'Reilly Media, the seventh annual MySQL Conference & Expo explores "Innovation Everywhere" April 20-23, 2009 at the Santa Clara Convention Center in Santa Clara, CA. With more than 2,000 attendees, it is the world's largest community event for open source database developers, DBAs, vendors, and corporate IT managers. For more information, please visit www.mysqlconf.com.
There are a finite set of cases where mySQL clients will hang on a connection-for a small period of time (seconds) or indefinitely. Most notably no-route to host causes a timeout to occur which in most clients are on the order of seconds to recovery.
In a web environment if a database is connected to on the fly, each connection made should take milliseconds-when the net is healthy. However, when a database server crashes a timeout for each connection takes seconds and there are cases where connections can just hang (recovery of INNODB as an example). For instance have you ever had a ssh session hang and a kill (SIGHUP..) does not work? This happens for mySQL client connections too.
I want the front ends to recover gracefully from a no-route to host, or more notably a hanged connection condition. I want to avoid that damn timeout all together on stateless connections. Why? Because if you use up all your worker httpd threads requests Fail-things crash, bosses get mad, everyone wakes up and you have a meeting about it later. So what are ways around this?
Use a Load balancer
This is cool and all, but is expensive since two or three are needed for HA purposes. In addition, this increases the complexity of managing servers, and most LBs are optimized for HTTPD traffic, not raw TCP traffic.
Use a memcache layer to keep DB state
This is cool as well, but the state server, which collects the state of your servers and updates memcache, is now a single point of failure-and a lot of time is spent to make this resilient. Not to mention all clients across Programming languages (bash scripts, Java, PHP, Python, etc) need to have access to this memcache layer and logically handle the connection the same. This is doable, but there is an easier way.
Use IRON DNS:
IRON DNS is a term I use to build an HA Resolver. IF a box fails a health check, a nagios event handler can tell IRON DNS to set the domain names IP address to 127.0.0.1 or a routable interface that produces a Connection Refused (111 instead of 110).
For an internal network, all database entries should be able to fit in memory. All resolves should take less then 2ms, and updates to DNS entries can happen in just a few seconds across your entire farm. The only drawback is if the DNS server fails to respond your site is boned. Making DNS Failure proof is rather easy-which is another post entirely.
For my solution, I am using DNS. This allows me to recover servers in a shard on different boxes without having to change code. This also allows my environment to recover from blocked I/O events that can spike load on the front-end that make requests slow.
Here are some conditions where connections take to long:
No Route to Host
INNODB recovery
Disk Fails (disk fills up)
Switch Fails / Flaps
Plug-in storage engine crashes, yet mySQL is up
DNS Resolve fails
If you have a better method, or want to advise me on flaws I should watch out for, drop a comment.
Memcache is a fantastic Hash table-very fast and one of the great successes of Brad Fitzpatrick-who in my opinion has done more for the open social movement as an individual then anyone else. I use memcache quite extensively, now I am thinking about adding ZLIB native to compress the value of each key-much like how INNODB does with the Barracuda file format. The theory is with a CPU hit, we can store more data per memcache instance. I've talked to the Northscale guys and they love the idea. What do you think?
for PHP for instance compresses the data on the client side.
There still might be some value compressing the data on the server-side, but now I'm not as motivated.
What might be a good alternative is to compress keys into 8 byte longs in memcached automatically instead of the actual string that can be huge. To give some more detail, 8 byte longs is a 64-bit int. A string can easily be converted into a big int by bit manipulation - and the address space is huge so key conflict is effectively removed.
Cloud computing is the big thing now days, weather you are an app developer using EC2 or the Google APP engine, or a new company trying to build your own cloud product. If you are hosting, or using a platform it costs money. I hate to spend money, especially money that is for my company. If I find an idle server, I use it to 100% utilization (prior to the saturation point).
I needed to build a new application that periodically crawls a website to update various lists. Building a crawler is expensive, especially from scratch. First, you have to define the amount of lag that is allowed from the crawl copy and the real copy. Of course the Project Manager does not want any lag, all events must be caught and near real-time without overloading the source of the data-but I am not hating, it is a challenge. Next, what technology to use, what language to write the app in-what considerations are left to be defined? How does one crawl Gigs, Tera, and amount of data in a guarantee period? On top of that, how much additional hardware is this going to cost. In addition, to be a cloud it needs to have an API so app developers can set, and get consistent data in an expected period. This is a lot of freaking requirements.
Therefore, to solve this issue, I know that mySQL will store the data, but getting the data is the hard part. This is what is going to cost money, lots of it. I looked around common architectures and found that nothing would do what I wanted to do in a cost effective manner. So, I designed my own using Seti @ Home as the basis for the design.
Get to the point already Dathan:
I have turned ever user who views my applications into a collector, using idle bandwidth without knowing who is collecting the data. My user base on spare cycles will fetch a feed of my choosing, and sends that data to my servers without any personal information. Instead of using an Amazon or Google service, I have turned my user base into a cloud to service their needs.
It is rather awesome-I must say. I am able to service the needs of more than 60 million users at the cost of development time, and NO NEW HARDWARE. The cloud does not have to be a service provider-it can be the end user as long as the end user is not impacted by the requests. BTW the team that I manage is freaking awesome-they built my vision with trial and error and a hand waved spec.
Currently the system scales as long as there is enough end users. If I lose all my users then well I am boned, but to support the feeds all I need is 100K nodes at the current rate. With 60 million end nodes, I am cool.
Imagine if Google with Adsense used this install base to tell Google if the data has changed for an arbitrary web address. All it needs is a few people to hit the same url, inform Google that the web address in question has a different checksum, then at that point Google’s crawlers can go fetch it, instead of constantly crawling data that doesn't change. Google would be able to reduce overall server cost significantly, if it just knew what data has changed instead of guessing what data has changed.
By next years Velocity conferance I hope to have a full disclosure on what technology my team used, how my team get around cross domain issues, and how to compute checksums to validate the data.
PS - I designed this, with my team we made it much better and one person implemented it and owns the product from this point on.
Where I work, Merlin is an important tool for us and provides a lot of insight that other, more generic monitoring tools do not provide. We love it, and in fact love it such much that we have about 140 database agents reporting into Merlin 2.0 from about 120 different machines. That results in a data influx of about 1.2G a day without using QUAN, and in a data influx of about 6G a day using QUAN on a set of selected machines.
It completely overwhelms the Merlin data purge process, so the merlin database grows out of bounds, which is quite unfortunate because our disk space is in fact very bounded.
The immediate answer to our purge problem was to disable the merlin internal purge and with the kind help of MySQL support to create a script which generates a list of record ids to delete. These ids end up in a number of delete statements with very large WHERE ... IN (...) clauses that do the actual delete.
I will be attending Community One tomorrow and on Thursday at Marriott Marquis Hotel, New York, NY. I am especially looking forward to the announcements tomorrow which sound very interesting :)
The first day is a free event featuring:
* Cloud Platforms – Development and deployment in the cloud. * Social and Collaborative Platforms – Social networks and Web 2.0 trends. * RIAs and Scripting – Rich Internet Applications, scripting and tools. * Web Platforms – Dynamic languages, databases, and Web servers. * Server-side Platforms – SOA, tools, application servers, and databases. * Mobile Development – Mobile platforms, devices, tools and application development. * Operating Systems and Infrastructure – Operating systems and virtualization. * Free and Open – Open-source projects, business models, and trends.
The second day of the event is focused on Deep Dives with two half-day sessions on MySQL and two full-day sessions on Java and Web development. I will be attending the session, "Using Java EE and SOA to Architect and Design Robust Enterprise Applications."
Following the conference, I will be a panelist at a Cloud Computing Seminar at Microsoft office in NY. It's going to be a long but exciting day!
It will be great to catch up with old and new friends at the event.
Walking a table means, traversing each row, commonly used in building queues, fixing data, or dumping a table. I've recently ran into a problem-caused by an assumption, where walking a table was taking way to long using the method
$pos = 0; do {
$result = SELECT col FROM TABLE LIMIT $pos, 1000; $pos += 1000; } while ($result);
The assumption was since INNODB uses a cluster index, this would traverse the table using the PRIMARY key. This is not the case, its not a problem in INNODB but a bad assumption, that I fell victim to. A table scan to each $pos occurs producing a Big-O of N^2. So, when the query:
SELECT col FROM TABLE LIMIT 1000000, 1000 is executed mySQL will scan all the rows up to row position 1001000 and for each subsequent iteration.
This is SLOOOOW. IMHO since the table is sorted by the primary key, mySQL should optimize this case - but it does not and will not. So, to walk an INNODB table fast, and keep liner time or a Big-O of N an alternative is
$last_id = 0 do {
$result = SELECT col FROM TABLE USE INDEX(PRIMARY) WHERE pkey_part > $last_id LIMIT 1000 $last_id = $result[count($result) - 1]->pkey_part
}while($result);
This dumps a table very fast, almost as fast as doing a count(*) on the PRIMARY KEY.
Another method is to
SELECT col INTO OUTFILE "/dir/file.ids" FROM TABLE;
but the data is local to the database - thus the need for the application to grab data. Another draw back of this method is that the dump produces more disk IO then walking a table off of a key, slowing down access to this table.
In conclusion, even if the storage engine keeps the table order consistent like INNODB does, do not assume that LIMIT 100000, 1000 is equivalent to a file seek of position 100000, without telling the Optimizer to use an index.
Tomorrow, I'll be attending the Executive Seminar on Cloud Computing at NASDAQ MarketSite (NY). Speakers include Dr. Werner Vogels and Mårten Mickos (ex-CEO of MySQL). Big thanks to Amazon and RightScale who were able to accommmodate my RSVP even when the registration had formally closed.
I hope to be able to catch up with Mårten Mickos during the event. In case I do succeed in catching up, is there any question you want me to ask him? You can email me or post a comment.
It's funny that the event site still shows Mårten's title as "SVP of Sun Microsystems’ Database Group."
Came across an interesting post by Bret (co-founder of FriendFeed) about how FriendFeed uses MySQL to store "schema-less" data. According to the post, they weren't having issues with scaling existing features but rather they were experiencing pain when trying to add features.
Now the way they are using MySQL is interesting and bizarre at the same time. At a very high level, it seems their approach is to use a RDBMS as if it is a column-oriented database. Of course, it makes me wonder why not just use a column-oriented database? I need to read the post again in the morning (too tired right now so just gave it a quick glance).
I am very interested in hearing thoughts from my peers at Planet MySQL regarding this approach. They seem to have gone great lengths to go this route. What issues and benefits you see of this approach and whether you ever see yourself taking this route? I, for one, am not entirely convinced of this approach and whether it can really scale down the road. Also, if it was someone other than Friend feed going down that route, I might have actually lost my tempered and yelled :)
Side note: Friendfeed is growing fast, and it would have been cool if Bret was speaking at one of the three upcoming MySQL events in April.
I'll be going into detail what is Sharding, how to Shard, pitfalls of Sharding, performance/throughput gains, shard roles, and performance scaling in general. I hope to make this the most comprehensive talk to date on the subject in 45 min.
The topic is called Scaling a Widget Company. I'll detail how I setup the data layer for Rockyou. How many transactions per second Rockyou is at, what the infrastructure is comprised of, how 99.999% uptime is achieved and hopefully get into BCP which I probably will not have time to go over.
If you want me to focus on specific aspects on the subject of shard'ing let me know and I will :).
Another project that I am doing introduces a new shard type. First, think of a Shard as RAID-10 on your database layer. The data is striped across N servers and mirrored for failure recovery.
Shard Types are Shards that serve specific purposes. For instance, I have an archive shard. This shard keeps data the is rarely ever read on even cheaper hardware-with slower disks and the data is naively compressed using the innodb_file_format=Barracuda option. If and/or when a person requests data that is archived, my software layer detects this case and migrates the archive data from the super crappy RAID-5 servers to the less crappy RAID-10 servers with 32 GB of RAM, all within 5-10 seconds.
Another shard type is the sub-shard. I use this terminology to define a different way of federating data for a predefined global object_id. A predefined global object_id is what you are federating by, for instance userid.
userid X maps to Shard 3
What if you had an application that is expected to take up 8TB of data? Sticking all of userid X's data on a single shard is very costly since current size of the cluster + 8TB means more servers are needed and is costly (more data, larger ranges, slower the avg query speed). So why not put it on another set of servers that have big slow'ish disks, i.e. the Sub-Shard.
Now for userid X
userid X maps to Shard 3 userid X maps to SubShard 1
Since some shard profiles do not require 100% uptime, I can run builds of mySQL that is cutting edge.
Using the Percona 5.1 build, I ran across new tweaks for XtraDB. The patch is from the Google Patch made by Mark Callaghan's Google Team. Mark's architecture is purely disk I/O bound, and from his tests, INNODB does not use I/O effectively. He added a bunch of code dealing with the I/O performance bottlenecks that innodb native has. The three new tweaks addressing the I/O bottleneck specifically are
innodb_io_capacity = 100 // If running without the patch this is the equivalent default behavior innodb_write_io_threads = 1 innodb_read_io_threads = 1
From what I gather, this means that DEFAULT INNODB uses one write thread 1 read thread and will only work on 100 pending iops at a time-which is probably why INNODB takes sooo long to shutdown since by default 90% of the buffer pool contains dirty pages.
If your servers have a large innodb_buffer_size, on the order of a few Gigs and the server has many spindles (greater then 1) then increasing these params may help your application purge dirty pages faster. Be warned; do not increase this to high. If your box has a lot of pending I/O and this new code does not have the bandwidth to flush based on your settings, the box will freeze. That means you will have to remove traffic off of the server, let the pending io finish, lower the settings and restart.
So, what is a good size to set this to?
Percona says 100*Number of spindles you have on the db server. That seems a bit high for me. I would say 100*Number of spindles / 2 - to start out with, then over time ramp it up.
Currently I am about to run 10 Shard or 20 servers with the Percona build. This will purely I/O bound load.
When you write good SQL, that use indexes properly there is one more obstacle that can slow down your app. The mySQL optimizer. From versions 3.23 to 5.1 the optimizer has been a problem for me. In mySQL 6.0 SUN/mySQL has resources improving it.
I wrote a post detailing how to pick indexes to get the most out of mySQL here.
Here is a post about the mySQL optimizer and what you can do to speed up your SQL SELECT statements.
What I would like to share with you today, is that UPDATE and DELETE statements can also use optimizer tricks that SELECT uses. Its not documented on the mysql.com but it is possible to do something like
UPDATE [YOUR TABLE] USE INDEX(`your_index_name`) SET col='val' WHERE [columns that satisfy your index].
To see if you have problems with your UPDATE statements taking to much time I recommend 1st running INNODB as your storage engine since SHOW INNODB STATUS will indicate what index your long running UPDATE is using. This can be achieved by looking for /RECORD LOCKS space id.* index/ of SHOW INNODB STATUS.
Another indicator that you might need to force the optimizer to do the right thing is to track "Deadlock detected" from concurrent update statements; look at the where clause of your UPDATE statement.
Next solve your issues by telling mySQL to use the correct index when setting exclusive locks via the USE INDEX statement above.
Oh how I love register_shutdown_function of PHP. This bad baby will execute at the end of the scripts in call order. So, how is this useful?
Say you want to log some action, but that logging DB is on another server. Additionally you do not want to take into account that writing to the other server may break your transaction because of timeouts if logged in the middle of a transaction. Another use case: you do not want to change a bunch of code in different places to batch all logging routines up.
public function __construct($platform){ parent::__construct($platform); register_shutdown_function(array($this, 'afterProcess')); }
public function afterProcess(){
foreach($this->getDataThatChange() as $key => $values){ $insert_data[] = $values; }
I have been using register_shutdown_function for years now, especially to clean up connections at the end of script execution. Since someone asked me if this was possible in PHP, I decided to post this quick usage of a cool php method.
If you have a slave, that slave is probably running with the read-only flag set in the mysqld-section of your my.cnf. To be able to write to a read-only slave you need to be the replication SQL_THREAD or have SUPER privilege.
Since 5.0.16, it is still possible to execute CREATE TEMPORARY TABLE on a read-only slave, so CREATE TEMPORARY TABLE privilege also allows you to write to a read-only slave in a limited and controlled way.
If you want to process a lot of data in a temporary table, you are probably creating the temporary table without any indices, then INSERT ... SELECT data into it, and then ALTER TABLE ... ADD INDEX afterwards, because that is usually faster than to insert data into a table with indices. Only that you cannot ALTER TABLE a temporary table, even on a server that is not read-only - in order to run ALTER TABLE on any table, even temporary onces, you need ALTER TABLE privilege which you might not want to give out lightly.
There is no reason at all to check ALTER TABLE privilege for an alter table operation on a temporary table, because that table is visible only to your connection and cannot be shared. It is also deleted when you disconnect. In fact there is no reason at all to check permissions for temporary tables. But it is done.
Because it is done, you can grant ALTER TABLE to a single table that does exist or even to a table that does not yet exist. If you grant ALTER TABLE privilege to an existing table on a read-only server, you cannot alter that table, because the server is read-only. If you then use CREATE TEMPORARY TABLE to create a table with that name, the temporary table will shadow the existing persistent table and for your connection the persistent table will become inaccessible.
The semantics of the GRANT will change, though, and will now apply to the temporary table, which is writeable on a read-only server because it is temporary, and is alterable because of the grant which was not meant to apply to it in the first place. Problem solved: I now can ALTER TABLE my temporary table on the read-only server after I have finished my data load.
All is well? Not!
There are multiple things at work here which I consider broken:
GRANTs are applied to temporary tables. This is not making any sense at all in my book. Temporary tables are connection-local objects and they cannot have grants applied to them which were always referring to persistent objects when they were made.
Temporary tables can shadow persistent tables in the namespace of a connection. Because GRANTS are tied to objects via the objects name and not an objects UUID or another form of truly unique object-identifier, GRANTS can refer to changing objects even when the grant does not change. This feels somehow broken, as in "not properly normalized". Does RENAME TABLE edit grant tables as well? I have to check!
By granting CREATE TEMPORARY TABLE privilege to a user I am allowing that user to shadow any other object within a schema. The temporary table will then pick up any rights granted to the shadowed object for the duration of its lifetime. This cannot be good.
So the good news is that the inaugural OpenSQL Camp is going to be an awesome event with mouth watering sessions by noted experts. The bad news (for me) is that I won't be attending it, which makes me sad. I cannot leave my town because my wife can go into labor anytime now.
The session list looks great! Congratulations and thanks to Baron, Sheeri, Ronald, all the sponsors and contributors for organizing the first OpenSQL Camp.
I will be watching PlanetMySQL closely for juicy blog posts. Hopefully, the one and only Sheeri is taking her camcorder!
Today I discovered Stack Overflow, a collaborative site that focuses on technical Questions. You can ask questions related to any language, apparently without having to register. The site is currently in beta. There are also a few MySQL questions that are currently unanswered.
I am currently tasked with writing Software Requirements Specification (SRS) document for a project. Effective sharding (based on specific criterion) and Scalability are key requirements of the project.
Scalability is traditionally classified as a non-functional requirement. My question to the community is that if scalability is crucial to a project, would it still be classified as a non-functional requirement? Are their cases when scalability requirements would be best classified as functional requirements?
I wrote an application that is able to send out 3-8 million messages an hour with only 10 CPU's. This application is a part of an Offline Task system that scales linearly.
How is this done, I'll go into detail hopefully at the mySQL conference if they accept my proposal.
The scope of this blog post is to go over building the "Task Queues". Currently I have 13 Queues, one queue for each Shard that I run. The data is federated by user or randomly federated with a GUID that lives as long as the job. A request came in to add 20 million jobs to the queue all at once. The problem is with this list, will I cause deadlocks in innodb as I add the jobs to the queue as one transaction? Can live traffic still write to this queue?
To verify that Deadlocks will not occur - having an understanding about how locks work in INNODB is key. I suggest reading this page.
To build the queue I dumped the data source by
SELECT identifier, 16 INTO OUTFILE "/data/mysql/BuildQueue.log" FROM SOURCE_TABLE WHERE CONDITION.
The isolation level is REPEATABLE-READ; I'm setting a shared lock to get the most current version of the data. Writes are not blocked.
Next:
I create a table on each shard where the queue is located.
CREATE TABLE IF NOT EXISTS OfflineTasksHold ( object_id bigint(20) NOT NULL DEFAULT 0, object_type smallint unsigned NOT NULL DEFAULT 0, PRIMARY KEY(object_id,object_type) ) ENGINE=INNODB;
Then I issue a command on each Shard
LOAD DATA INFILE "/data/mysql/BuildQueue.log" INSERT INTO TABLE OfflineTasksHold; START TRANSACTION; INSERT INTO OfflineTasks (object_id, object_type) SELECT * FROM OfflineTasksHold; COMMIT;
Each shard is getting around 4-5 million rows, while accepting real-time traffic of 20-60 tasks a second to the OfflineTasks table. The OfflineTasksHold table does not have any real-time requirements and is solely used to keep the queue in Primary Key order, plus there is a CHANCE that LOAD DATA could set an exclusive lock on the OfflineTasks table-shutting down adding data to the table by the live site. The INSERT sets an exclusive lock on the rows that are being added, so the Offline Task Sheppard - the process that pops tasks off the queue is blocked for a small period - which is acceptable. Why are they blocked? Well, the massive insert sets an Exclusive Lock. The Sheppard is trying to grab the rows that are locked waiting up to 50 seconds, until innodb_wait_timeout is reached. This condition is acceptable. All other inserts are able to go into the queue without a slow down.
In summary, the job queue is built and can be automated with confidence knowing that death to the various app will not occur. Processing slows down for a bit but speed right back up.
Today's "commercial quality disks" are amazing but they follow the same limitations as yesterday's disks. mySQL scales very well, but disks do not. So if you're IO bound when will your expectation of speed fail?
Test setup:
DELL 2950 PERC-6 HWRaid BBC 6 DISK 15K RPM 3.5" RAID-10 256K stripe across two channels-using WRITE THROUGH CACHE on mkfs.ext3 -T largefile4 Linux Filesystem.
The theory is that the outer part of the spindles is the fastest, and the inner portion is slower - since the outer is where the data starts (thanks for the info Benjamin Schweizer). Thus one can conclude that the more disk space your application(s) use the slower the throughput, since the heads have to move more. Brad F. my co-worker did a benchmark to prove this. Our goal is to find out at what is the saturation point if our expectation is to have 22 MB / sec of random access.
Why do we want 22MB / sec of random access throughput? We want to guarantee a certain level of performance when adding new apps to a common backend-which is I/O bound: we need to know when things will break.
Here is what Brad found: Total disk size for our RAID-10 setup =~ 800G. What point does it FAIL to achieve our expectations of sustained 22MB/s?
rndrw test across 100G test / 750G LV =~ 35 MB/s # outer part of the spindles rndrw test across 100G test / 300G LV =~ 32 MB/s # outer part of the spindles rndrw test across 250G test / 300G LV =~ 24 MB/s # sweet spot rndrw test across 350G test / 384G LV =~ 21 MB/s # saturation point rndrw test across 750G test / 800G LV =~ 14 MB/s # waste of space
In conclusion these test show that even though a RAID-10 setup with 800G of space is available, the expected performance drops when data exceeds the sweet spot of 250G-300G of 800G usable-data array.
Disclaimers: There are many factors that can raise or lower the bar, like different file systems, different I/O schedulers, flushing. For my setups I like
Deadline I/O scheduler 256K Stripe few inodes (don't need them) ext3 since that’s what stable and available.
Tomorrow morning I am presenting a session, Startup Scalability Strategies, at Startonomics, a conference being organized by Dave McClure and Deal Maker Media. The sessions will be streamed live using UStream. Check the Startonomics website at http://startonomics.com for more details.
Let me first start of with the disclaimer, that I do not use memcache to scale, I use it to reduce latency. I'm of firm belief that the database layer should be able to handle the requests, while memcache is used to keep frequent requests returning in a consistent time frame i.e. reduce I/O spikes.
Capacity planning is key to making sure your site can serve the requests to users. If the site is slow, or down that is loss revenue in any revenue model used to monetize your product.
How to determine when you need to add more memcache servers.
The stats I look at are system stats and memcache stats.
Memcache is Memory / network heavy. CPU spikes are very low, and if the CPU starts maxing out that is probably due to some sort of network driver issue or huge context switching or large values stored in memcache.
This server dedicated to memcache. The context switching is huge due to all of the constant requests-but we are talking about modern day CPU's which can context switch like crazy. The thing that is bugging me is that requests are starting to go into the run queue, not at a alarming rate but still this is an indication of some possible issue.
This is something that is graphed on ganglia. If the run queue on average starts increasing, there is some problem.
Next stats from memcache.
/** * @desc get extended status from all servers */ public function CacheGetStats(){
if ($GLOBALS[cfg][disable_feature_memcache]){
return true;
}
return $this->memcache_obj->getExtendedStats(); }
I have a class called Cache which is a wrapper around memcache class calls. Cal Henderson would kill me if I was using classes at Flickr. Don't get me wrong I agree with Cal 100% but the environment I am in now requires classes-so I have to use it. The reason why we don't like classes is for another post.
Notice on this server we have a good hit rate and no evictions. Yet looking at one server is not good enough, look at them all- the reason more memcache servers means more memory to store data for your application. The CRC32 hash that the PHP memcache client uses is not very even and some keys may be requested more.
Take a look at this server. The evictions are high, indicating that memcache needs to make room for new objects. This is not good, its an indication that the LRU is evicting objects out faster then their expire time. Additionally the memcache gets are much greater then the hits. This is an indication that memcache is not really working as good as it can.
But one server is not an indication that there is a problem. Looking at the system as a whole is to determine if a problem exists. My rule of thumb is if the 30-40% of the servers have a high eviction rate, its time to add 30-40% more servers or memory.
Now allot of this can be tuned by changing the slab size, but learning from John Allspaw, don't make a plan based on a possible gain, make a plan based on the current usage. Then if the possible gain works your golden.
How do you base your stats on adding more memcache servers?
I had a lot of good experience using fabforce's DB designer. Now that I upgraded to mySQL workbench, I have nothing but good things to say.
In 5 mins I was up and running after install. I took my mysqldump of my schema
mysqldump -d --no-add-drop-tables
Clicked File->Import->Reverse Engineer MySQL Script
Tada a picture of my schema.
Pictures are always nice to represent what your conveying to individuals who may not be as experienced with the db structure as yourself. Just for this feature alone I recommend using Work Bench.
I attended several events in June of this year including Graphing Social Patterns East, Velocity and Structure 08. At each of these events, I tried to take some notes and posted them to my personal blog. I received a few pings from readers of this blog to point them to a list of these posts. It took some time but here is the list of my notes. In some cases, I have linked directly to the presentation files.
Working the Clouds: NextGen Infrastructure for New Entrepreneurs: This panel on cloud computing featured panelists including Geva Perry (GigaSpaces), Jason Hoffman (Joyent), Tony Lucas (XCalibre), Lew Moorman (Rackspace), Christophe Bisciglia (Google / AppEngine) and Joe Weinman (AT&T). Christophe got grilled heavily by other panelists but he handled it pretty darn well.
Werner Vogels: Keynote at Structure 08: Dr. Vogels keynote was one of the highlights of Structure 08. He presented case study of Animoto and talked about the 70/30 switch among other things.
The Platform Revolution: A Look into disruptive technologies: Jonathan Yarmis AMR Research (VP of Disruptive Technologies) talked about technology trends, social networking, mobility, mobile , cloud computing, stream computing, business models, user 2.0 and the new enterprise reality.
Green Data Centers: Bill Coleman (Cassatt Corporation) presented this session. Bill is known for being responsible for the B in BEA. (Presentation slides)
Creating Bebo Applications: Bebo is now part of OpenSocial and this presentation presented at Graphing Social Patterns talks about how to create applications for Bebo.
Open Social and Google App Engine: Patrick Chanezon (API Evangelist) and Paul McDonald (Product Manager for Google App Engine) presented a technical overview of OpenSocial and Google App Engine at Graphing Social Patterns East.
OpenSocial: Open for Business: In this session, panelists were Patrick Chanezon (Google), Paul Lindner (hi5), Max Newbould (MySpace) and Sachin Rekhi (imeem). (Also see)
Mobile Social Networks: A Comparison: Benjamin Joffe's excellent eye opening session for anyone interested in using mobile platform for creating social networking solutions.
Top 5 Things that fail and win on social networks: Dave McClure, chair of Graphing Social Patterns, presented this concise but every effective presentation on what fails and what wins on social networks.
Widget Strategies & Social Platforms: Hooman Radfar, CEO of Clearspring Technologies discussed the new role of widgets and how to go about creating them.
Scaling MySQL - powered Web Sites by Sharding and Replication: Slides from Peter Zaitsev's session at Velocity. (Presentation slides)
Capacity Management: John Allspaw's signature presentation on capacity management. John also has a book coming out on this topic. (Presentation slides)
LinkedIn Communication Architecture: Slides about LinkedIn's platform built in Java. (Presentation slides)
SOX Compliance: A presentation by Skye Rogers. I missed this presentation but then caught up with Skye at dinner. I wish Skye would have received more time to discuss SOX Compliance. (Presentation slides)
There were several sessions I didn't get to go to which is a sad thing. You may want to check the conference websites directly (linked at top of this post) to see if there are presentation slides available. Also, if you took notes at these sessions, please feel free to drop the links as comments to this post.
So running mysql-5.1.24rc with INNODB-Plugin-1a I ran into "Can't initialize function 'INNODB'; Plugin initialization function failed", and noticed a more serious issue with the whole plugin architecture.
Before without plugin if the engine died so did mysqld. Now, mysqld can stay up and accept connections. But what if you have 1000 mysql instances, and all of your Nagios checks where to test if mysqld was up and accepting connections? This entire test is now invalid. I now need a Nagios check to test if the plugin-engine is up, by doing a write to the engine.
Now to the cause of the issue:
Here is my condition. 5.1.24rc with INNODB-Plugin was killed by OOM killer (mysqld memory leak?). When it came back, my expectation was for innodb to go through a recovery process, it didn't. Strange. Then I got this message from replication.
So I did a show plugin command and innodb showed that it was installed. Since I'm out of ideas I thought that I would unload the various INNODB plugins.
Tada found the problem, I could not uninstall them something is wrong with the plugin. So, to make everything smooth for a reinstall of the plugin.
1. TRUNCATE mysql.plugins // I don't know if this is needed 2. restart mysql 3. INSTALL PLUGIN INNODB SONAME 'ha_innodb.so';
Now I have an error InnoDB: mmap(26600275968 bytes) failed; errno 12 InnoDB: Fatal error: cannot allocate the memory for the buffer pool
Weird
I look at whats allocated. mysqld is consuming 11GB when nothing is loaded, and the various key buffers are sized very low. So, just to test a theory I lower the innodb bufferpool from 26GB to 19GB. Since 19+11 = 30GB and I have 32GB total.
1. start mysql 2. INSTALL PLUGIN INNODB SONAME 'ha_innodb.so';
Now it hangs. But, now this looks a familiar failure, I think its now recovering, so I look in the mysql error log and indeed it is recovering!
I put this bad boy into production, and its working great.
So, I had an app that was sticking data into S3 when the data was marked or qualified as warehouse data. The problem with this method was when retrieving the data from S3 it was not chunked but appended to. So pulling 100M over the wire, parsing, sorting and adding to the db would fail since the process consumed more then its allowed share of resources. Plus grab data from S3 has a very high response time, plainly put its slow.
So, since the warehouse data is prone to error-the main reason why I needed a new solution, I decided to use ZLIB compression that is native to Innodb's Barracuda build.
I use utf8 for everything, KEY_BLOCK_SIZE will define how much extra overhead the CPU will have to work, think of this as a compression level. I use 4K since some rough benchmarks I've done shows that this is the sweet spot for my INSERT rate + Select rate.
Finally I stick the entire dataset on a slow RAID-5 array. I use RAID-5 because when I want the data-I want to get it fast and put it into the production dataset real-time. I need the disk space and redundancy. The writes are not extraordinary and not very expensive from a I/O perspective. I'm getting 60% reduction in the INNODB table space. On top of that the format of the data is the same format as the production system. So I can pull this data fast write to the shard in a single httpd process. I can migrate archive data to production quality in less then a second for every 3000 rows. I qualify this as a win. We'll see how stable this is over the next few weeks.
So in conclusion, I've replaced S3 with a cheaper solution, faster solution, I put archived user data on cheap servers, and as a result the price per user goes down, since my capacity plan is based off a function of disk growth-on my fast disk cluster. Next I can store my backups on S3 instead of semi-live data :)
“Funny how Amazon doesn't use S3 to store any assets for amazon.com”tweet by @gruber
Amazon's S3 suffered a major outage today knocking many websites offline. S3 outage started at approximately 12:00 PM EST and the last time I checked at 11:11PM EST, Smugmug, a popular photo hosting site that extensively uses S3, was still down.
Currently, there are several great opportunities with exciting companies available in the New York area. If you're a rock star Java/PHP/Ruby developer or a pixel-obsessed designer, contact me at your earliest convenience.
Web Developer:
Give Real is a well-funded startup in the midst of an exciting period of growth and success. Our technology uses a patent pending platform that combines the ubiquity of credit card transactions and the power of social networks to create a new gifting experience.
Our primary platform is Rails, but there are programming challenges that range from SOAP APIs to Facebook application development. We are searching for full-time developers with expertise and broad experience in:
* Ruby on Rails (we also use rSpec, Starling, Memcache) * MySQL * xHTML & CSS, and comfort with Javascript * Team development with tools like Git & Trac
In addition, we are also interested in candidates who have:
* Expert Javascript skills * Java & SOAP experience * Experience scaling with Rails, or any other web platform * Comprehensive Linux knowledge * UI and graphic design backgrounds
We are willing to pay top-notch developers very competitively (plus the possibility of options) to join our team and help write code that will be used by hundreds of thousands of users within a few months. We are ideally located in downtown Manhattan less than a minute walk from the BDFV and NRQW lines
Also, if you know someone who may be a good fit for us (developer or graphic designer), we are offering a $1000 referral reward for anyone we hire.
Please contact us at jobs@givereal.com
Graphic Design:
Give Real is a well-funded startup in the midst of an exciting period of growth and success. Our technology uses a patent pending platform that combines the ubiquity of credit card transactions and the power of social networks to create a new gifting experience.
We're searching for full-time designers with experience in:
* Design for advertisements * Design for consumer focused websites & applications * xHTML & CSS coding * HTML & design for emails * Working on top of an MVC or template system (we use Rails)
In addition, we are also interested in candidates who have:
* Team development with tools like Git & Trac * Comfort with Javascript programming * Rails programming experience
We are willing to pay top-notch developers very competitively (plus the possibility of options) to join our team and help design the look and feel of a service that will be used by hundreds of thousands of users with a few months.
Also, if you know someone who may be a good fit for us (RoR developer or graphic designer), we are offering a $1000 referral reward for anyone we hire.
Please help save Ivan, son of Andrii Nikitin (MySQL Support Engineer), who needs a bone marrow transplant. Andrii's message is below:
"My family got bad news - doctors said allogenic bone marrow transplantation is the only chance for my son Ivan.
"8 months of heavy and expensive immune suppression brought some positive results so we hoped that recovering is just question of time.
"Ivan is very brave boy - not every human meets so much suffering during whole life, like Ivan already met in his 2,5 years. But long road is still in front of us to get full recover - we are ready to come it through.
"Ukrainian clinics have no technical possibility to do such complex operation, so we need 150-250K EUR for Israel or European or US clinic. The final decision will be made considering amount we able to find. Perhaps my family is able to get ~60% of that by selling the flat where parents leave and some other goods, but we still require external help."
-- Andrii Nikitin, MySQL Engineer
Please remember, every little bit will help the family pay for Ivan's operation! Be as generous as you can.
Today at 1PM EST I am presenting the second part of memcached for MySQL webinar. I was told that the registration numbers look as good as the previous one. This one will be a bit more technical than the previous webinar. Sorry for the late notice but hope you can join!
In just a few minutes, I will be leaving for Graphing Social Patterns East, a conference by Oreilly. Dave McClure of 500 Hats is the conference chair. I plan to meet old friends and make new ones. It should be a lot of fun. More about Graphing Social Patterns.
Ever wish you could have a browser based shell for Google? One that was clutter and advertising free? Say hello to Goosh, one of the coolest service to hit the web.
It even recognizes 'clear' :) For now, I am addicted to it.
Disaster is really inevitable. Even with all the redundant power investments, ThePlanet (formerly EV1 and RackShack), had to shut down their backup generators at their H1 data center on the instructions of the fire crew. This happened after a wire-short in fault transformer led to an explosion that knocked off one of their walls, ultimately bringing 9,000 servers down. Luckily no one was injured.
This just goes on to show that just because a data center has redundant power and backup generators, it does not mean that a disaster cannot happen. IIRC, ThePlanet's last disaster was blamed on backup generators not kicking off properly.
While there was no damage to servers, I wonder how many MyISAM repairs need to be triggered once the servers do come back online?
Michael Arrington of TechCrunch asks Twitter a few questions. I have only included a sample list below but you should read his blog post for all the questions:
Is it true that you only have a single master MySQL server running replication to two slaves, and the architecture doesn’t auto-switch to a hot backup when the master goes down?
Do you really have a grand total of three physical database machines that are POWERING ALL OF TWITTER?
Is it true that the only way you can keep Twitter alive is to have somebody sit there and watch it constantly, and then manually switch databases over and re-build when one of the slaves fail?
A 'yes' answer to any of these questions by Twitter would be disturbing to say the least. However, it won't be surprising as companies expect databases to just somehow magically work without creating and supporting a proper architecture. High availability doesn't comes cheap and reputation for companies is everything.
I find it amusing that Twitter isn't even looking for a DBA. May be that's considered a job for the SA over there :)
A big thank you to all those who attended the memcached webinar today on which I was a panelist. I was told that there were more than 560 registrants.
The feedback I received directly and indirectly shows that there is a lot of interest about memcached. In the future, I hope to work again with MySQL/Sun on more memcached related webinars.
If you attended the webinar and have some suggestions, comments or questions, please contact me at fmashraqi at yahoo dot com or post a comment on this blog.
Special thanks to Jimmy Guerrero, Monty Taylor, Rich Taylor, Edwin DeSouza and Alex Roedling for their hard work in arranging the webinar. Also thanks to Brian Aker, Matt Ingenthron and Trond Norbye for their assistance at various phases.
Regarding my earlier post on memcached webinar, I was informed today that more than 420 registrants have signed up. Space is limited and filling up fast so if you are interested in memcached and haven't registered yet, click on the following link to register now!
Today, many large scale web 2.0 properties (including my employer) save millions of dollars by depending on memcached to bring their application response time under control and to offload pressure from databases.
There are several success stories about using memcached to speed up database driven websites. Facebook, for instance, runs the largest memcached installation and the numbers only keep increasing. In May 2007, Facebook was reportedly running 200 dedicated servers with 3TB of memory in their memcached cluster. At the "Scaling MySQL Up or Out" Keynote, Facebook revealed they are now using 805 dedicated memcached servers. That's more than a 400% increase in less than a year!
For my employer, memcached has been a crucial component of the infrastructure that has been instrumental in handling explosive growth in a cost-efficient manner. In addition, memcached has helped us offload billions of queries from our database.
To highlight several real-life use cases of memcached (see below), I will be presenting a memcached webinar on Thursday, June 29 at 1 PM EST (10 AM PST). Monty Taylor (Senior Consultant, Sun Microsystems) and Jimmy Guerrero (Sr Product Marketing Manager, Sun Microsystems - Database Group) will also be speaking at the event. Space is limited and filling up fast (200+ registrants already) so I highly recommend registering now.
In this webinar, I will be covering several use cases for memcached including (but not limited to):
deterministic cache
non-deterministic cache
proactive cache
"state" cache
filesystem cache replacement
Hope to "see" you at the webinar. Note: This memcached webinar is not to be confused with the memcached webinar being presented by Ivan Zoratti on June 28.
At the MySQL Conference and Expo, right after my participation in scaling up or scaling out keynote panel, I talked to Sun's Multimedia team about Sun and MySQL in our environment.
Recently, I found the interview on Sun's Multimedia page. The video of my discussion is embedded below:
Over the weekend I took some notes from a presentation and did some research from various sources. The result was a blog post about Internet trends that I posted on my personal blog. There are some very interesting statistics about Internet usage and social networking. Also, Facebook fans will find some interesting facts as well.
When InnoDB creates a new transaction it is not yet committed. The database has not yet made any promises to the application and so we do not really have to make anything persistent so far.
To be fast InnoDB tries to assemble the transaction in a memory buffer, the innodb_log_buffer. It should be sufficiently large that you actually can assemble such a transaction in memory without needing to write it out in part into the redo log. A size of 1M to 8M is normal.
Once a transaction is to be committed InnoDB has to read the page from disk which contains the image of the row that is being changed. It then has to actually make that change in memory. The changed page is cached in a memory pool called the innodb_buffer_pool. This pool also caches unchanged pages that have been accessed by read operations. All of these pages on disk and in memory are 16K in size and the innodb_buffer_pool_size determines how much RAM we will use as a cache for such pages - usually as much as we can spare.
InnoDB does transactions. Meaning: It collects statements working on InnoDB tables and applies them on COMMIT to all tables "at once". Either all of these statements inside one transaction succeed ("commit") or all of them fail ("rollback"), changing nothing.
By default, the database is in AUTOCOMMIT mode. Meaning: The server sees a virtual COMMIT command after each statement. You can disable autocommit completely, or you are starting an explicit transaction inside autocommit using the BEGIN statement.
MySQL offers a variety of storage engines giving you a lot of flexibility in managing your storage and data access needs. Still I encounter customers who are not using this flexibility when they should, because they lack information about the advantages of Non-MyISAM storage engines or which are using storage engines like InnoDB as if they were using MyISAM.
This is the introductory article in a series of texts that will hopefully once become an InnoDB tutorial. A german version of this article is available in my german language blog.
An InnoDB Tutorial
The InnoDB storage engine is an engine that can be operated ACID compliant, does transactions and foreign key constraints. It is useful for all applications that do online transaction processing or have a high rate of concurrent write accesses for other reasons.
PHP is a nice programming language for web applications with a large number of databases supported.
PDO is one of many database access abstractions trying to unify the way PHP talks to databases. Unlike most of these, PDO is written in C and not in PHP.
Wez Furlong wants to improve PDO "Version 2". In particular he and others want to produce an open spec that can be used by anybody including database vendors to code against, add unit testing for the spec and the drivers that claim to support it and improve metadata handling for PDO.
In order to get database vendors, notably IBM, into the game he proposes a Contributor License Agreement to make it easier for vendors to commit into the PDO part of PHP. There exist corporate and individual versions of the CLA, and the PDO license is yet another license that is different from the PHP license. Wez has prepared a FAQ covering questions regarding all that. Continue reading "PHP PDO V2 CLA"
My thinking was that the topic is already dead, but people have strange ideas off and on again. Have a look at the S9Y boards where you'll find someone who wants to have a storage backend "LDAP" for S9Y. This is sick and wrong! Let me explain why.
According to my findings in Bug #31876, MySQL does not commit data to disk in Windows using the same method MS SQL Server and DB/2 are using. The method MySQL uses appears to be seven times slower in pathological scenarios.
The bug report contains a patch - thanks to the MySQL WTF (The Windows Task Force) and the lab provided by the customer for helping me to find that.
Does this work for you? I want to hear about your test results.
Lately, I have had opportunity to evaluate a very large Ruby installation that also was growing very quickly. A lot of the work performed on site has been specific to the site, but other observations are true for the platform no matter what is being done on it. This article is about Ruby On Rails and its interaction with MySQL in general.
One of the major contributing factors to the success of MySQL is the ease and simplicity of its replication. Read-slaves for scaleout and backup-slaves for noninterrupting backups are the norm in any MySQL installation I have seen in the last two years.
So how does replication work? And how shall it be expanded in the future?
What is available?
The binlog written by MySQL currently logs all statements changing the tablespace. It is a serialization of all tablespace changes. The binlog position, expressed as (binlog name, offset), is a database global timestamp - a timestamp expressed in seconds.fraction does not work for any precision at all, because on a multi-core machine multiple things can happen concurrently.
If you want to make a consistent full backup of the database, the database must not change during the backup. That is, it must be possible to associate one and exactly one binlog position with the backup.
In fact, if you have such a backup - one associated with a binlog position - and you happen to have the binlogs from that time until now, it is possible to do a point-in-time (PIT) recovery. You'd recover from the full backup and you'd then replay the binlog from the backups binlog position until now. That is why it is important to store the binlog in a filesystem that fails independently from the rest of your MySQL. That's also why you must not filter the binlog that is written by MySQL using binlog-do-db and binlog-ignore-db - if you do, you'll get an incomplete binlog that will fail to be useful in a PIT recovery scenario.
A slave in MySQL is now nothing but a binlog downloader and executor: The slave must be restored from a PIT-capable full backup. It is then being told the current binlog position and where to log in to get the missing binlog. The slaves IO_THREAD will then log into the master server and download the binlog to the local disk as fast as possible, storing it as the relay log. The slaves SQL_THREAD will then start to execute the relay log as fast as possible. Replication can thus be thought of as an ongoing live recovery.
In MyISAM, we do have LOAD INDEX INTO CACHE. In InnoDB this does not work. For benchmarking I often require a way to preload the innodb_buffer_pool with the primary key and data after a server restart to shorten warmup phases.
According to Blackhole Specialist Kai, the following should work:
CODE:
mysql> create table t like innodbtable;
mysql> alter table t engine = blackhole;
mysql> insert into t select * from innodbtable;
Another win for the unbreakable BLACKHOLE storage engine.
Sorry for the low maintenance level on this site, personal matters have limited my available time. I plan to now come back to more active maintenance of this site, beginning with an update to the forums.
The forums now allow for registration without admin approval but have a new requirement that those registering must answer a simple question to submit their registration. This should hopefully allow new members to join without waiting for their approval, while also preventing spam registration.
When I return from the MySQL Conference I’ll be restoring the long-missing samplecode section to the site and restoring the three-parts of the Yatta tutorial.
One of the recent issues wiped out my copies of Yatta that were sent to me by you the readers. If you previously sent me Yatta, please re-send to mike@thisdomain.
During the site failure the articles and source for the Yatta series was lost, so I thought I would put a call out: do you have a local copy of the articles and source for the Yatta series? If so, please comment here or send me an email!
I didn’t have a chance to restore the full structure to vbmysqldirect’s project pages yet, but I have posted an archive of the project pages to our new Projects page.
Good news everyone! At this point most of the articles are restored. I’ll be finishing the restoration of the last two or three articles soon and moving on to a restoration of the sample code and the project pages (vbmysqldirect in particular).
Thank you for waiting so far, and if you find any errors in the restored articles please let me know.
Those of you who have been around for a while will likely know that part 4 of the tutorial had some code that became outdated when VB2005 came out of Beta. I’ve updated the code samples in the article and its zip file but lack the time right now to do a full validation due to work.
As such, I need your help. If someone out there could be so kind as to start at Part One and work through the tutorial using the current release of VB2005 to check for issues it will be much appreciated.
I’ll try to get parts 5-7 up soon and move on to the remaining article content. Then I’ll work on getting samplecode online.
Also, new content is always appreciated. If you have an article or sample you want to submit feel free to let me know and I’ll create a contributor account for you in the CMS.
I’m also looking for someone to manage the Spanish language articles and moderate the new Spanish language topic in the forums.
I’ve installed forums to http://www.vbmysql.com/forums. Because of the new hosting arrangements and the difficulty in restoring the previous forum software, I’ve changed backends. This means you’ll need to re-register and recreate any posts you were still looking for answers to.
I’ve restored two articles so far and would appreciate any feedback as to the appearance/style of the articles posted so far. Since I’m rebuilding site design as I restore articles, now is a good time to voice your opinion.
I went on to test the program in every way I could devise. I strained
it to expose its weaknesses. I ran it for high-mass stars and low-mass
stars, for stars born exceedingly hot and those born relatively cold.
I ran it assuming the superfluid currents beneath the crust to be
absent -- not because I wanted to know the answer, but because I had
developed an intuitive feel for the answer in this particular case.
Finally I got a run in which the computer showed the pulsar's
temperature to be less than absolute zero. I had found an error. I
chased down the error and fixed it. Now I had improved the program to
the point where it would not run at all.
-- George Greenstein, "Frozen Star: Of Pulsars, Black
Holes and the Fate of Stars"