Some weeks ago I've resigned from Sun Microsystems, where I worked as a MySQL Telecom Sales Engineer. Beginning July 13th I will take on a new role in the MySQL ecosystem, as "Project Manager and COO" for Monty Program Ab.
Since it can be argued I'm moving to a competitor, and I was working in a customer facing role, upon resigning I had to immediately return my computer, phone, etc to Sun, and my login accounts, including email, were terminated. I've gotten some comments about this (like: "Is there no trust in the world anymore?" and "How can you treat anyone like that?") and I cannot stress this enough: This is normal, I expected it, indeed brought it to my managers attention myself. If nothing else, think of it this way: This precaution also protects me from any misunderstandings and/or false accusations. Besides, starting your vacation by returning your laptop and removing your email account isn't the worst thing to happen to you - try it, you'll love it :-)
This had in any case the side effect that I was not able to honor a long standing tradition of sending a goodbye and thank you note to a certain internal mailing list, because I cannot do that from my private email. (Also asking others to forward a mail proved to be problematic/sensitive in many ways.)
Hence, I'm posting the email I wrote here instead, so that both my former collagues and other MySQL community members can read it alike:
A key difference in Drizzle is the definition of utf8 as 4 bytes, not 3 bytes as in MySQL. This combined with no other character sets leads to an impact on the length in keys supported in Innodb.
During a recent test with a client, I was unable to successfully migrated the schema and provide the same schema due to unique indexes defined for utf8 VARHAR(255) fields.
Here is the problem.
mysql> create table t1(c1 int unsigned not null auto_increment primary key, c2 varchar(255) not null, unique key (c2)) engine=innodb default charset latin1;
Query OK, 0 rows affected (0.05 sec)
mysql> create table t2(c1 int unsigned not null auto_increment primary key, c2 varchar(255) not null, unique key (c2)) engine=innodb default charset utf8;
Query OK, 0 rows affected (0.03 sec)
mysql> create table t3(c1 int unsigned not null auto_increment primary key, c2 varchar(256) not null, unique key (c2)) engine=innodb default charset utf8;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
drizzle> create table t1(c1 int not null auto_increment primary key, c2 varchar(255) not null, unique key (c2)) engine=innodb default charset latin1;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your Drizzle server version for the right syntax to use near 'charset latin1' at line 1
drizzle> create table t1(c1 int not null auto_increment primary key, c2 varchar(255) not null, unique key (c2)) engine=innodb;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
Only a maximum of 191 is now possible.
drizzle> create table t1(c1 int not null auto_increment primary key, c2 varchar(191) not null, unique key (c2)) engine=innodb;
Query OK, 0 rows affected (0.03 sec)
drizzle> create table t1(c1 int not null auto_increment primary key, c2 varchar(192) not null, unique key (c2)) engine=innodb;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
This is the 1st publicly available version of Fosimo.TR — free twitter client for windows ultimate tool which will help you keep track of your twitter account without browsing through twitter.
Be updated instantly with your Twitter friends status updates.
Synchronize Twitter friends with matching OUTLOOK contacts. For windows mobile devices owners - you should be able to see your contacts photos on your phone display.
Autoupdate (Fosimo.TR autoupdates each time a new version is available).
In addition
We will never ask you for your Twitter password and you don't need to enter one on Fosimo.TR (because Fosimo.TR fully supports oAuth authentication technology). Your Fosimo.TR will be updated automatically each time there is a new version of the product. There is no need to download uninstall and reinstall the tool each time.
Unless you have been hiding under a rock, you have one way or the other heard about the king of pop passing away. This is in my opinion a great loss to the world, as this man was like a god to me when I was a little boy.
Besides it being really sad that he died, it is also a good test for the systems us techies take care of: the web has seen some of it’s biggest spikesin traffic on june 25th.
That is not over yet though. Next tuesday there will be a memorial service in LA, which will also be live broadcasted on the web. For any system administrator and/or DBA responsible for a site that is news or social media related, this will be yet another good real-life load test.
Rest assured that many people will find their favorite news site or social media application to get the latest on this event.
Now might be a good time to make sure you are ready for a spike like that, and if you were planning a day off for tuesday, you might want to inform your replacement for the day about your phone number. Then again: if your design is dodgy, you are in trouble anyway.
I was trying to demonstrate to a client how to monitor queries that generate internal temporary tables. With an EXPLAIN plan you see ‘Creating temporary’. Within MySQL you can use the SHOW STATUS to look at queries that create temporary tables.
There is the issue that the act of monitoring impacts the results, SHOW STATUS actually creates a temporary table. You can see in this example.
mysql> select version();
+-----------------+
| version() |
+-----------------+
| 5.1.31-1ubuntu2 |
+-----------------+
1 row in set (0.00 sec)
mysql> show global status like 'created_tmp%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 48 |
| Created_tmp_files | 5 |
| Created_tmp_tables | 155 |
+-------------------------+-------+
3 rows in set (0.00 sec)
mysql> show global status like 'created_tmp%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 48 |
| Created_tmp_files | 5 |
| Created_tmp_tables | 156 |
+-------------------------+-------+
3 rows in set (0.00 sec)
What has perplexed me in the past, and I can’t explain is that SHOW SESSION STATUS for this example does not increment. It’s confusing to tell a client to use SHOW SESSION STATUS for SQL statements, but the behavior is different with SHOW GLOBAL STATUS. For example, no increment.
mysql> show session status like 'created_tmp%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0 |
| Created_tmp_files | 5 |
| Created_tmp_tables | 2 |
+-------------------------+-------+
3 rows in set (0.00 sec)
mysql> show session status like 'created_tmp%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0 |
| Created_tmp_files | 5 |
| Created_tmp_tables | 2 |
+-------------------------+-------+
3 rows in set (0.00 sec)
Let’s look at a query that creates a temporary table.
mysql> explain select t1.* from t1,t2 where t1.c1 = t2.c2 order by t2.c2, t1.c1;
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 3 | Using temporary; Using filesort |
| 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 3 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
2 rows in set (0.03 sec)
If we use session status we get an increment of 1.
mysql> show session status like 'created_tmp%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0 |
| Created_tmp_files | 5 |
| Created_tmp_tables | 2 |
+-------------------------+-------+
3 rows in set (0.00 sec)
mysql> show session status like 'created_tmp%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0 |
| Created_tmp_files | 5 |
| Created_tmp_tables | 2 |
+-------------------------+-------+
3 rows in set (0.00 sec)
mysql> select SQL_NO_CACHE t1.* from t1,t2 where t1.c1 = t2.c2 order by t2.c2, t1.c1;
Empty set (0.00 sec)
mysql> show session status like 'created_tmp%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0 |
| Created_tmp_files | 5 |
| Created_tmp_tables | 3 |
+-------------------------+-------+
3 rows in set (0.00 sec)
If we use global status, in this case it’s and idle server so I know there is no other activity, however in a real world situation that isn’t possible.
mysql> show global status like 'created_tmp%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 48 |
| Created_tmp_files | 5 |
| Created_tmp_tables | 171 |
+-------------------------+-------+
3 rows in set (0.00 sec)
mysql> select SQL_NO_CACHE t1.* from t1,t2 where t1.c1 = t2.c2 order by t2.c2, t1.c1;
Empty set (0.00 sec)
mysql> show global status like 'created_tmp%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 48 |
| Created_tmp_files | 5 |
| Created_tmp_tables | 173 |
+-------------------------+-------+
3 rows in set (0.00 sec)
Josh Berkus writes, “Now that PostgreSQL 8.4 is out, I thought I’d write a little about my favorite 8.4 feature. As Mr. Performance Whack-a-Mole, what makes me happy about 8.4 is the ability to whack moles faster … which is why I’m very fond of pg_stat_statements.”
On ad’s corner, Andreas Scherbaum says, “Up to PostgreSQL 8.3 it was only possible to grant (and revoke) permissions on the entire table. If column level permissions were needed, a workaround like a view solved (more or less) the problem . . . This . . . is uneloquent, error prone and does not scale well. . . . PostgreSQL 8.4 solves the problem with a shiny new feature: column level permissions.”
David Fetter looks into WITH (so much drama in the CTE): ” By now, you’ve probably seen that PostgreSQL 8.4 can produce Mandelbrot sets . . . but what are Common Table Expressions really about? [They] are just views you create at run time. Oh, and they can refer to themselves. That latter capability is powerful and flexible (read confusing), so I’ll go over that one later. Run-time views are pretty handy, though.”
Andrew’s PostgreSQL blog introduces parallel pg_restore for PostgreSQL 8.4: “I try to complete at least one significant feature item per PostgreSQL release. This time the feature is making pg_restore run in parallel. This is quite important for many users, particularly some large enterprise users.”
SQL Server
Eric Johnson introduces SSIS 2008 and the new lookup: “SSIS is a pretty useful tool for designing ETL processes. One of the transformations I was disappointed with in 2005 was the lookup. I found it a little better than the lookup functionality in DTS, but not too much better. Well, Microsoft must have read my mind because SQL Server 2008 has a new shiny Lookup transformation that is, in my opinion, much improved.”
Kimberly L. Tripp was thinking about looking for security vulnerabilities in database code. “I’ve always been concerned with security and I’ve always stressed the importance of auditing the REAL user context not just the current user . . . So, I generally try to avoid using dynamic string execution and if necessary create well tested/protected parameters . . . Having said that, what if I’m looking at a database for the first time . . . I’ve come up with a quick query… And, while it’s not going to “solve” your problem . . . or even truly verify if you’re vulnerable, it gives you a ‘quick list’ of where you should look first! ”
Adam Machanic exposed the hidden costs of INSERT EXEC, beginning, “INSERT and EXEC: relational brothers in arms, helping you get your data and then put it somewhere. But like all brothers, their relationship has its ups and downs and sometimes you need to look just below the surface to see the real issues. In this post I will cover hidden penalties that a decision–seemingly purely aesthetic–can bring to the fore.”
Data Management has a first-rate HOWTO on dynamic column names and fields in SSRS. “I had no choice but to do work with creating dynamic column headings and dynamically determine what field in my dataset should go where in a report today. Sense [sic] this is the second time I’ve gone through this exercise and knowing the lack of information out there on really how to do it, I thought it deserves a blog entry. . . . So here is the basis of the requirements. You have a query that uses PIVOT but thrown into the mix is the need for dynamic columns in the PIVOT.”
On xaprb, Baron Schwartz has a review of MySQL Administrator’s Bible. “I think that MySQL Administrator’s Bible is a very good complement to High Performance MySQL, 2nd Edition. The former is useful to people who have varying levels of knowledge, while the latter assumes a lot of experience and doesn’t cover introductory material much. . . . So if you’re new to databases, or if you’re new to MySQL, you might do well to start with Sheeri’s book . . . ”
Over on the MySQL Performance Blog, Baron looked into gathering queries from a server with Maatkit and tcpdump: “For the last couple of months, we’ve been quietly developing a MySQL protocol parser for Maatkit. It isn’t an implementation of the protocol: it’s an observer of the protocol. This lets us gather queries from servers that don’t have a slow query log enabled, at very high time resolution.”
Ronald Bradford looked at verifying MySQL replication in action, with “ . . . a very simple test to show MySQL replication in action and to also better understand the basics of MySQL Replication. With a configured MySQL environment we can run the following test on your MySQL master, and monitor the MySQL slave.”
H. Tonguç Yilmaz asserted, Instrumented Code is Better: An Example on How to Generate Session Level ASH Reports. ” After 10g if you have its license ASH based performance monitoring is an alternative to the traditional SQL Tracing in my opinion. You may either use EM to generate ASH reports or directly query ASH dictionary. The handy additional hint here is that you can set filters to ASH data in order to focus on a specific session, the filter options at ASH report screen at EM can be used for this purpose or below is an example to monitor the waits from v$active_session_history view.”
Doug Burns responded with his item on session-level ASH reports: “I think [Tonguç's] post is really showing two different things, one more successfully than the other.”
Randolf Geist reports a Dynamic sampling and set current_schema anomaly: “If I’m now supposed to analyse a rather complex SQL with dozens of table and/or view references I like to use the ALTER SESSION SET CURRENT_SCHEMA command . . . This basically works just fine, but I recently have come across an interesting anomaly in 10.2.0.4: If the optimizer uses dynamic sampling as part of the parsing then it doesn’t do so anymore if the CURRENT_SCHEMA has been altered to a non-default schema.”
On the Oramoss Blog, Jeff Moss looks at the case of no pruning for MIN/MAX of partition key column: “Recently, I wanted to work out the maximum value of a column on a partitioned table. The column I wanted the maximum value for, happened to be the (single and only) partition key column. The table in question was range partitioned on this single key column, into monthly partitions for 2009, with data in all the partitions behind the current date, i.e. January through mid June were populated. There were no indexes on the table.” Jeff and his readers examine the evidence and discuss some workarounds to the problem.
(No SQL?!? Edwards, you’re mad!) Well, it’s not me. Here’s Curt Monash on NoSQL: “Eric Lai emailed today to ask what I thought about the NoSQL folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, such as BigTable, Hadoop, Cassandra and so on.”
PostgreSQL stuff also had some thoughts on those who say No to SQL. “Every time I see something or hear something like this I sigh a little bit. Not only when it’s related to SQL but in the world of computer professionals in general. ‘The right tool for the job’ seems to be a hard concept to understand sometimes. I wonder why?”
We'd like to introduce a new awesome tool for Twitter, which is called Fosimo.TR and will be available in a couple of days for the public! It will change the way you tweet today and get updated with Twitter friends status updates.
In case you are not familiar with Fosimo yet. Fosimo is a free Facebook client tool for windows which helps you keep track of your Facebook account without browsing through Facebook.
Fosimo.TR is a free Twitter client for windows which is doing virtually the same but on Twitter. You can tweet, be synced instantly with friends status updates and much more including features that are not on the core Twitter.
Screenshot
As I mentioned: 1st version will be available shortly. We are already in the testing stage :)
It’s not the most common task in the world, but you might want to view processes from a particular user and once in a while you might even need to kill processes from a single user, be it during an attack or because you simply got a bug in an application bombarding your db server with connections!
Here is a small stored procedure which does exactly that!
call process_list(’show’,‘username’,‘hostname’);
– shows all processes owned by username@hostname
call process_list(‘kill’,‘username’,‘hostname’);
– kills all processes owned by username@hostname
The code for this stored procedure can be found below. If you have any comments / suggestions feel free to comment below.
###################################################################### ## ## ## Stored Procedure: process_list & kill user ## ## call process_list(’show’,'%’) show processlist for all users ## ## call process_list(’show’,'root’) show processlist for root user ## ## call process_list(’kill’,'user1′) kill connections for user1 ## ## ## ## by Darren Cassar http://www.mysqlpreacher.com ## ## ## ######################################################################
DROP PROCEDURE IFEXISTS process_list;
DELIMITER $$
CREATE PROCEDURE `process_list`( choice char(4), usernamein varchar(16), hostnamein varchar(60))
BEGIN
DECLARE CURCONN int;
IF choice <> ’show’AND choice <> ‘kill’ then SELECT"wrong choice";
END IF;
IF usernamein = ” then SET usernamein = ‘%’;
END IF;
IF hostnamein = ” then SET hostnamein = ‘%’;
END IF;
SET CURCONN=(SELECT connection_id());
IF choice = ’show’ then
SELECT * FROM information_schema.processlist WHERE ID <> CURCONN AND
USER LIKE usernamein AND ( HOST LIKE CONCAT(hostnamein ,‘:%’)OR
HOST LIKE hostnamein );
ELSEIF choice = ‘kill’ then
IF usernamein = ‘root’ then SELECT"Illegal username when killing processes";
ELSE SET @CNT = ( SELECT count(*) FROM information_schema.processlist WHERE ID <> CURCONN AND
USER LIKE usernamein AND ( HOST LIKE CONCAT(hostnamein ,‘:%’)OR
HOST LIKE hostnamein ) );
SET @VAR=1;
WHILE ( @VAR <= @CNT) DO
SET @TID = ( SELECT id FROM information_schema.processlist WHERE ID <> CURCONN AND
USER LIKE usernamein AND ( HOST LIKE CONCAT(hostnamein ,‘:%’)OR
HOST LIKE hostnamein )LIMIT1 );
SET @k = CONCAT(‘kill ‘ , @TID);
PREPARE killcom FROM @k;
EXECUTE killcom; SET @k=NULL;
This release is a patch file for MySQL's table partitioning feature. (ndb is excluded) This patch file works to add methods("cond_push" and "cond_pop") to ha_partition. As a result, every storage engine's table that use table partitioning can get condition-pushdown through ha_partition.
There was no problem because only ndb supported engine-condition-pushdown up to now. This patch was needed because the Spider storage engine supported engine-condition-pushdown recently.
Yesterday I had the honour of presenting my mini-bi/datawarehousing tutorial "Starring Sakila" for MySQL University. I did a modified version of the presentation I did together with Matt Casters at the MySQL user's conference 2009. The structure of the presentation is still largely the same, although I condensed various bits, and I added practical examples of setting up the ETL process and creating a Pentaho Analysis View (OLAP pivot table) on top of a Mondrian Cube.
The slides, session recording, and materials such as SQL script, pentaho data integration jobs and transformations, and Sakila Rentals Cube for Mondrian are all available here on MySQL Forge.
Copyright Notice
Presentation slides, and materials such as SQL script, pentaho data integration jobs and transformations, and Sakila Rentals Cube for Mondrian are all Copyright Roland Bouman. Feel free to download and learn from it. But please do not distribute the materials yourself - instead, point people to the wiki page to get their own copy of the materials. Personal use of the files is allowed. Use these materials for creating training materials of using these materials as training materials is explicitly not allowed without written prior consent. (Just mail me at roland dot bouman at gmail dot com if you would like to use the materials for such purposes, and we can work something out.)
The NoSQL event has triggered a bit of a hard time for the RDBMS the last week. I won’t add any commentary as this follows what I have been talking about for a while, but here are some of the links. Most notable is Michael Stonebraker’s post on the ACM site.
At Percona we provide services both Onsite - visiting the customers and Remote - logging in to their systems or communicating via email,phone,instant messaging.
We believe both approaches have their benefits and drawbacks and mixing them right way allows you to get your problems solved most efficient way.
Onsite visits are great as they allow consultant to meet your team in person and great for relationship building. It is great for architecture design and review as you can sit down with the team and use drawing board. It also often allows the best focus both for consultant and for participating team - when consulting visit is arranged it is usually the top priority for some of the staff members which provide consultant with information and assistance he might need.
Onsite visits also often allow to get prompt attention from other team - looking for network engineer to understand network topology or for someone from marketing to get the growth plans - they typically can be brought in for question.
I believe Onsite visits also offer unmatched training experience for the team - one of the team members can work side by side with consultants observing what he does and asking the question about the progress.
The Single Day visits are often extremely valuable as a kick off to do application redesign, for performance review or for starting long term Remote DBA or support relationships. They can be done with little planning as basically for any system there is enough work to spend efficiently understanding the system and working on the pressing problems customer may have. Such visits are often result in a lot of “homework” as application changes or implementing changes on production which when can benefit from remote followups to validate changes and provide further advice.
Multi day onsite visits require a bit more planning to be efficient. Many times in my career I would come to the customer for a week only to find I have created the implementation plan within 1-2 days and it will take a while to implement it and pass it through development and testing stages. Before long onsite visit it is very good to have a call with consultant to understand what exactly needs to be done, how much time and what resources it requires.
Projects which can be efficient as multi day visits are whenever implementation is required (coding unlike giving advice takes a lot of time) - migration projects, any forms of scripting, implementation (writing queries), hands on setting up MySQL, Replication, Monitoring, High Availability with MMM or DRBD are good examples.
Another good example of efficient Multi-Day visit is when there are multiple groups or multiple applications using MySQL - in such case each of the problem areas/groups can gets its own day or few hours which keeps consultant busy.
To make a Multi-Day visit efficient it is important to plan on what will be done - what people will consultant need to work with and which problems do they have. Providing proper access and having staging/test systems available are also often required.
Remote work has a benefit of consultant being available in the short time intervals and being able to multi task.
Working with the customer there are two types of “waits” which are often encountered. First - waiting on the customer. It could be fixing access problems, completing application changes, QA, scheduling downtime to implement changes and millions of other things. Second - waiting on database (or other) operations to complete - backup, restore, ALTER TABLE, load testing - all of these things take time and make consultant to wait for them.
In case you’re working remotely consultant typically has other tasks to do while such long processes are running. If you’re spending time onsite you may have other tasks or you may not while if you’re working remotely there are always things to do.
Another benefit of remote work - it truly allows to engage the team. If you have consultant onsite getting another one with some specific knowledge is complicated while getting another person to login to the system remotely for a quick look is easy and efficient.
Time is another interesting factor - onsite visits usually happen during office hours while changes to production are often avoided. With remote work it is much easier to do work which needs to be done at night hours.
Remote work however also needs to be well organized. The benefit of onsite work is - it is always planned at very specific time. Given date and time consultant will be there so both consultant and customer prepare for the visit. If appointment is just a phone call or online meeting there is larger chance it can be forgotten by ether party. In many cases the remote work not planned to the specific time at all which can cause it to be delayed because of either party delays.
If you need something to be done fast and you’re doing it remotely make sure to plan it to exact time and make sure there is someone to assist consultant if he needs any help such as access issues, some questions about setup or what exactly given queries are doing. Prompt responses can keep the ball rolling much faster.
Another trick to make remote work successful is to be very clear in your instructions especially if you’re not going to be available to assist consultant when he is doing the work. As remote work is often done cross time zones this becomes a very important factor.
In case you have a lot of work done in such “disconnected” mode it is a good idea to have daily/weekly/or be-weekly (depending on project intensity) calls to get team to discuss the progress and generally get team on the same page.
In general surprisingly a lot of things can be done in completely remote mode - we’ve done not only basic optimizations but architecture redesigns, large scale deployment, migrations and high available architectures implementation such a way often surprising customers (in a good way) of how little customers the work took.
The Great Mix As I mentioned before mixing Onsite and Remote work can be indeed the best mix especially for the complex projects. Onsite visit is great to setup relationships, get initial understanding of the system and get initial project planning. After it is done a lot of work can be done remotely quite efficient while there may be the benefit in onsite status-checks and team interactions as well.
The Scaling with Flash webinar I’ve mentioned earlier was a success and we got the recording available. It contains Percona presentation, presentation of Schooner appliances and Q&A session. Enjoy.
We are happy to announce that MySQL Proxy 0.7.2 is available in a source and binary release for all of our target platforms.
This latest release also brings back Windows support in both the source and binary release.
The list of important changes in this release is:
* fixed memory leak proxy plugin (#45272)
* fixed ro-balance.lua (#45408)
* added CMake build files
* fixed portability issues for Win32
* added mysql-proxy-svc on Win32
* updated INSTALL file to cover all the build steps on win32
I’ve had a few VCs ask how we compare to Hadoop and companies using MapReduce. With Google blessing MapReduce, it seems to be the cool new thing. I figure I’m going to have to explain this to VCs, so I might as well blog about it.
MapReduce is a process of dividing a problem into small pieces and distributing (mapping) those pieces to a large number of computers. Then it collects the processed data and merges (reduces) it into a result set. Hadoop provides the plumbing, so users focus on writing the query and Hadoop handles the dirty work of mapping and reducing. Such a query, using a procedural language like Java, is more complex than a comparable SQL query, but more on that below.
So what is MapReduce good for? It really shines when you want to summarize, analyze or transform a very large data set. This is why it is well suited to web data. Map reduce doesn’t utilize an index, so the tradeoff you need to consider is whether parallelizing the task provides greater efficiency than leveraging an index. If, for example, you want to find all customers who have placed more than one order in the past year, MapReduce can quickly scan the entire data set and discover how many have placed two or more orders in the past year. In this scenario, exploiting large numbers of computers in parallel is very fast. On the other hand, if you want to find an individual with a specific address, using the index to find that person is more efficient than processing the entire data set.
There is some overlap between the database and MapReduce worlds. MapReduce can help, as in the example above, to extract insights from large quantities of data that is not indexed. It is also helpful in feeding data into your database using processes known as Extract Transform and Load (ETL) and Extract Load and Transform (ELT), depending upon whether the transform step is done before or after loading the data into the database.
MapReduce has been addressed using procedural languages like Java. This is fine for batch processing of static data (e.g. historical sales data from last year). Yahoo’s Pig and Google’s Sawzall are attempting to expose the power of MapReduce through declarative languages like SQL, which would simplify the ad hoc processing of static data. Generally speaking, the processing of static data is the realm of Business Intelligence (BI) and Data Warehouses (DW).
The other realm of the database world is dynamic data. Dynamic data means data that is being modified on the fly, generally by multiple writers, while multiple readers are also trying to access the data. In this realm, you want to find a specific data element, lock only the smallest piece of data possible for an update, coordinate concurrent requests to change that data, all while ensuring that the changes are safe-using transactions-and the data remains consistent to readers. These are things that depend heavily on indexes and coordination or locking. MapReduce is not designed to handle these types of demands. Ad hoc processing of dynamic data, the realm of Online Transaction Processing (OLTP) databases is orthogonal to MapReduce. This also happens to be the realm of ScaleDB.
So, in short MapReduce can deliver performance gains in the processing of static data, namely in the BI and DW markets. Some projects to graft on declarative languages may further simplify this process. However, MapReduce does not provide value in the area of managing dynamic data such as the OLTP world.
So the answer is: MapReduce is cool and exciting, but there is no correlation between MapReduce and ScaleDB’s target market. We don’t use it, nor do we compete with it.
Earlier this year I had published a small blog about being efficient when using mysql prompt. This is a small continuation of it highlighting a couple of other cool features which I really find very useful when working command line (i.e. always!).
The first I’m gonna list here is setting the prompt itself by typing –
\R mysql \D >
thus enabling date and time display at each comand which is great for auditing and record keeping. There is a whole bunch of prompt values which you can find here:
Option Description
\c //A counter that increments for each statement you issue
\D //The full current date
\d //The default database
\h //The server host
\l //The current delimiter (new in 5.0.25)
\m //Minutes of the current time
\n //A newline character
\O //The current month in three-letter format (Jan, Feb, …)
\o //The current month in numeric format
\P //am/pm
\p //The current TCP/IP port or socket file
\R //The current time, in 24-hour military time (0–23)
\r //The current time, standard 12-hour time (1–12)
\S //Semicolon
\s //Seconds of the current time
\t //A tab character
\U //Your full user_name@host_name account name
\u //Your user name
\v //The server version
\w //The current day of the week in three-letter format (Mon, Tue, …)
\Y //The current year, four digits
\y //The current year, two digits
\_ //A space
\ //A space (a space follows the backslash)
\‘ ’//Single quote
\" "//Double quote
\\ //A literal “\” backslash character
\x //x, for any “x” not listed above
The second thing is the \e or edit which lets you edit a long query you might have been writing, in any editor you have set on your machine, most often vi, a must know editor for all of you out there.
Version 0.4 of libdrizzle has been released. This was mostly a maintenance release with build system changes and small bug fixes. This is the client and protocol library for Drizzle and MySQL that provides both client and server interfaces.
Version 0.4.1 of the Drizzle PHP Extension has also been released. James Luedke has moved development and releases of the extension into PECL, and has also fixed a number of bugs, extended the interface, and worked with the PHP/PECL developers to get the extension up to the proper PHP coding standards. Thanks James!
Yesterday I still worked for Sun Microsystems as a Principal Engineer & MySQL Sr. Architect. Today was my first day of work at Monty Program, Inc, a subsidiary of a tiny company established by Monty Widenius in February. Yet I didn't even make the top ten. If people want not to miss the train they have to hurry up.
Am I happy? Oh, yeah... No more waking up with the question constantly drilling my mind: “What am I doing here?” What are all of us, MySQL Server developers, doing without Monty? Waiting for the time when all our options are vested? I can't . That's too long for me. I'm already too old to wait any more. Besides, we've already lost at least 3 years. We have to do what we planned to do in 2005. We have to raise the Server to the level where any RDBMS that claims to be called mature should be.
So who is newly born? Me? In a way, yes. This is my second reincarnation for the history of MySQL, after the first one that happened in December 2002 .
Also newly born is this blog where I'm planning to share with you, from time to time, my observations on the Server development at Monty Program and on interesting patches in the new server code that other people contribute. However, I don't want to limit myself only to this topic. What else am I going to share with you? You'll see soon enough...
Version 0.8 of the Gearman C Server and Library has been released. This includes basic HTTP protocol support, build system improvements, and bug fixes.
For the last couple of months, we've been quietly developing a MySQL protocol parser for Maatkit. It isn't an implementation of the protocol: it's an observer of the protocol. This lets us gather queries from servers that don't have a slow query log enabled, at very high time resolution.
With this new functionality, it becomes possible for mk-query-digest to stand on the sidelines and watch queries fly by over TCP. It is only an observer on the sidelines: it is NOT a man in the middle like mysql-proxy, so it has basically zero impact on the running server (tcpdump is very efficient) and zero impact on the query latency. There are some unique challenges to watching an entire server's traffic, but we've found ways to solve those. Some of them are harder than others, such as making sense of a conversation when you start listening in the middle. In general, it's working very well. We can gather just about every bit of information about queries that mysql-proxy can, making this a viable way to monitor servers without the disadvantages of a proxy. In theory, we can gather ALL the same information, but in practice we are going for the 95% case.
As always with Maatkit, this has minimal dependencies. It doesn't require any Net::Pcap or other modules from CPAN. It's written in pure Perl, and it parses the output of tcpdump, rather than watching the network traffic directly. This might sound useless, but it's not. It means you can go tcpdump some traffic on a machine without Perl installed, and copy it to another machine for analysis, or send it via email to your friendly consultant, or do any of a number of other things. Decoupling things is very helpful sometimes.
Let's see how to gather queries and do something useful with them. I'll just watch the queries on a sandbox server on my laptop, and print out the profile synopsis so you can see how it works.
I'm kind of showing off the summary profile here to illustrate that you can get really compact results to see what's going on inside your server. What do you suppose that one query was that took a tenth of a second? We can find out.
select 1 from ( select sleep(.1) from dual ) as x\G
Indeed, it's no surprise the query took a tenth of a second to execute, and now you see where "SELECT dual" comes from.
Notice that it is inspecting the protocol enough to see the flags set in the protocol, indicating the warning count, error count, rows affected, and whether no index or no good index was available. Look at the top of the report -- what is up with the 12% of queries that say No_index_used? If we increase --limit a bit, we can see
I did not know that SHOW DATABASES sets the "no index used" flag, did you? Now we both do!
This is just a brief introduction to what the protocol parser can do. Of course, in real life it's much more useful than just seeing a query or two -- it has all the power of mk-query-digest for filtering, aggregating, printing and so forth.
Sun Microsystems, Inc. today announced that TweetMeme, the innovative online application that tracks the popularity of links sent via Twitter, has subscribed to Sun's MySQL Enterprise database offering in order to keep up with the microblogging service's explosive growth.
Created in 2008 by pioneering Web 2.0 startup Fav.or.it, TweetMeme gives the rapidly expanding Twitter community a means of easily seeing and sorting the most popular links on Twitter. TweetMeme also provides a realtime search facility of fully expanded, qualified and indexed links so that users can get answers to queries about very recent events or news.
As described in “Batching - improving MySQL Cluster performance when using the NDB API“, reducing the number of times the application node has to access the data nodes can greatly improve performance and reduce latency. That article focussed on setting up multiple operatations (as part of a single transaction) and then executing them as a single batch sent by the NDB API library to the data nodes.
The purpose of this entry is to show how a single NDB API operation can access multiple rows from a table with a single index lookup. It goes on to explain the signifficance of this both now and in the future (much faster joins using SQL for MySQL Cluster tables).
There are several operation types to cover table scans and index lookups (refer to the “MySQL Cluster API Developer Guide” for detals). For this example, I use an NdbIndexScanOperation.
The code sample assumes that the following table structure and data has been set up for table “COUNTRY” in database “TEST_DB_1″ using the NDB storage engine (Note that the full example application sets this up automatically):
The following code causes the NDB API library to send a single request from the application to the data nodes to read the rows where the primary key “SUB_ID” falls into the ranges (2<= SUB_ID <4); (5 < SUB_ID <=9) or (SUB_ID == 13). Note that this is just a fragment of the code and the error checking has been removed for clarity (refer to full example application to see the rest of the code, including the error handling).
NdbIndexScanOperation *psop;
/* RecAttrs for NdbRecAttr Api */
NdbRecAttr *recAttrAttr1;
NdbRecAttr *recAttrAttr2;
psop=myTransaction->getNdbIndexScanOperation(myPIndex);
Uint32 scanFlags=
NdbScanOperation::SF_OrderBy |
NdbScanOperation::SF_MultiRange |
NdbScanOperation::SF_ReadRangeNo;
psop->readTuples(NdbOperation::LM_Read,
scanFlags,
(Uint32) 0, // batch
(Uint32) 0) // parallel
/* Add a bound
* Tuples where SUB_ID >=2 and < 4
*/
Uint32 low=2;
Uint32 high=4;
Uint32 match=13;
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundLE, (char*)&low);
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundGT, (char*)&high);
psop->end_of_bound(0);
/* Second bound
* Tuples where SUB_ID > 5 and <=9
*/
low=5;
high=9;
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundLT, (char*)&low);
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundGE, (char*)&high);
psop->end_of_bound(1);
/* Third bound
* Tuples where SUB_ID == 13
*/
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundEQ, (char*)&match);
psop->end_of_bound(2);
/* Read all columns */
recAttrAttr1=psop->getValue("SUB_ID");
recAttrAttr2=psop->getValue("COUNTRY_CODE");
myTransaction->execute( NdbTransaction::Commit);
while (psop->nextResult(true) == 0)
{
printf(" %8d %8d Range no : %2d\n",
recAttrAttr1->u_32_value(),
recAttrAttr2->u_32_value(),
psop->get_range_no());
}
psop->close();
When run, this code produces the following output:
SUB_ID COUNTRY_CODE
2 1 Range no : 0
7 46 Range no : 1
8 1 Range no : 1
9 44 Range no : 1
13 1 Range no : 2
Why is this signifficant?
This can be very useful for applications using the NDB API; imagine an application that wanted to find the birthdays for all of my friends. Assume that I have 2 tables of interest:
mysql> describe friends; describe birthday;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| name | varchar(30) | NO | PRI | NULL | |
| friend | varchar(30) | NO | PRI | | |
+--------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| name | varchar(30) | NO | PRI | NULL | |
| day | int(11) | YES | | NULL | |
| month | int(11) | YES | | NULL | |
| year | int(11) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Using the NDB API, I can create 1 NdbIndexScanOperation operation to read all tuples from “friends” where the “name” field matches “Andrew” and then use the results to perform a second NdbIndexScanOperation to read the rows that match each of my friends’ names in the “birthday” table. In other words, performing a join using only 2 trips from the application to the data nodes.
Of course, it’s simple to get the same results using SQL…SELECT birthday.name, birthday.day, birthday.month FROM friends, birthday WHERE friend.name='Andrew' AND friends.friend=birthday.name;However, if the tables are very large and I have a lot of friends then performing this join using SQL can be expensive as it requires a separate trip to the data nodes to fetch each birthday. In the future, Batched Key Access (BKA) will optimise these joins by performing one of these NDB API Multi-Range Reads (MRR) to fetch all of the birthdays in one go! Until then, using the NDB API directly can deliver signifficantly faster results.
I'm pleased to announce the release of Vertical Partitioning storage engine version 0.2. http://launchpad.net/vpformysql
The main changes in this version are following.
- Support MySQL's table partitioning. MySQL's table partitioning that is only introduced last release is now supported. note: After this version, you need to apply a patch for installing Vertical Partitioning storage engine.
- After this version, the primary key part of InnoDB's secondary index is effectively used. You can use the primary key columns with InnoDB's secondary index by using following statements. create table tbl_b( col_a int not null, col_b varchar(20), col_c int not null, primary key(col_a), key idx1(col_c) )engine=InnoDB; create table tbl_c( col_a int not null, col_b varchar(20), col_c int not null, primary key(col_a), key idx1(col_c) )engine=InnoDB; create table tbl_a( col_a int not null, col_b varchar(20), col_c int not null, primary key(col_a), key idx1(col_c, col_a) )engine=VP comment='table_name_list "tbl_b tbl_c"'; note: Vertical Partitioning storage engine can use index with primary key together with index without primary key, because Vertical Partitioning storage engine can use different storage engines for child tables at same table. In this case, the purpose of writing the column of primary key to tbl_a's idx1 is to absorb the difference by this coexistence.
Please see "99_change_logs.txt" in the download documents for checking other changes.
With thanks to Patrick Galbraith and his DBD::drizzle 0.200 I am now able to test client benchmarks side by side with MySQL and Drizzle.
For simple benchmarking with clients, generally when I have little time, I use a simple Perl framework mybench. I was able to change just the connection string and run tests.
The diff of my two scripts where:
---
> my $user = $opt{u} || "appuser";
> my $pass = $opt{p} || "password";
> my $port = $opt{P} || 3306;
> my $dsn = "DBI:mysql:$db:$host;port=$port";
---
< my $user = $opt{u} || "root";
< my $pass = $opt{p} || "";
< my $port = $opt{P} || 4427;
< my $dsn = "DBI:drizzle:$db:$host;port=$port";
---
It’s too early to tell what improvement Drizzle will make. Just running my first test with single and multi thread tests shows an improvement in all figures in Drizzle via MySQL, however I will need to run this on various different versions of MySQL including the latest 5.0 to confirm.
tar4ibd is made to be sure that read of InnoDB page is consistent. Before we had some complains what in stream mode some pages are getting corrupted, and we suspect tar can do read of pages in time when they changed. So we patches libtar to make read consistent.
Export is added to support moving .ibd tablespaces between servers.
The list of other features in the release includes:
Support of ARCHIVE tables
Addded incremental option to innobackupex-1.5.1 script
Rollback XA PREPAREd transaction automatically at --prepare
To extend tablespace size, if needed after --prepare
Data page corruption check is added to local backup mode
I've now posted all the current talk submissions to the OpenSQL Camp Wiki. A big Thank You to everyone who contributed so far and helped us to bang the drum for this event! If you haven't heard about OpenSQL Camp yet, it's a subconference of the Free and Open Source Conference (FrOSCon) in St. Augustin, Germany, which takes place on August 22+23. The topic of OpenSQL Camp is "Open Source databases and related technologies" and we're looking for interesting presentations in this field.
As we have 12 session slots to fill, we still have room for at least 6 more submissions! It's also a tad bit MySQL-centric at the moment, that should definitely change! We would love to get some more diversity to cover a broader range of Open Source Database technology.
So please submit your talk proposals and help spreading the word — the Call for Papers is still open until July, 19th! Post a message to relevant discussion forums and mailing lists. Know an expert in this field? Approach him directly! OpenSQL Camp Speakers will receive free entry to FrOSCon, which is worth visiting in any case!
I've been working with SMF on and off for a few weeks now and still get stung by this error. If you see something like this in the SVC log after enabling a service (i.e., svcadm enable mysql):
bash-3.00# cat /var/svc/log/application-database-mysql:v51.log [ Jul 1 07:36:37 Enabled. ] [ Jul 1 07:36:37 Executing start method ("/lib/svc/method/mysql start") ] svc.startd could not set context for method: chdir: No such file or directory [ Jul 1 07:36:37 Method "start" exited with status 96 ]
head over to the manifest file for the service and check the method_context section. In my case, I had no working_directory element set so adding this element solves the problem.
My FC10 to FC11 yum upgrade got stuck on a zillion dependencies .. well actually libssl and mysql from Remi .. but from there it's a whole chain of other things. So I had the great idea to go for the F11 Live CD, the LiveCD works like a charm,, only upon trying to install the live CD to my OS partition it started failing on me with a bunch of squashfs errors ,
It really looks like this F11Beta blocking issue which apparently didn't get solved completely after all.. the annoying part is that I can't reproduce the issue anymore .. as my system is now fully working .. maybe on my office desktop next week :)
Oh well.. the full DVD ISO got downloaded and burned and now I can once again enjoy the annoyancies of a freshly installed distro, missing packages, configs that have lightly chaged etc :) Wonder if sound is working out of the box now :)
Update: It works .. partly ... (no sound in firefox it seems atm..)
In today’s gp/bp an open door will be kicked in: take your backups offsite!
I was actually tempted to create a poll to see how many of you do not have proper backups, and how many of you do not take those backups offsite. It is a simple piece of advice and relatively simple to set up. Offsite in this case would ideally be physically offsite: to a different server in a different building in a different location. A start however is to take them to a different server. And don’t make the mistake of thinking a different VPS on the same physical server is good enough. True, that will protect you from operating system failure of the guest, but it will likely not protect you from hardware failure, or operating system failure on the host OS.
Also, take good care of how you are getting your backups offsite. A normal FTP connection might do the job, but it is hardly secure. Ideally, use SFTP or rsync over ssh to stream your backups offsite.
Some people still take their backups offsite by physically moving a cd, dvd or tape from one location to another. It’s a good start, but in this age of cheap broadband, you might want to think about doing this online. A cron-job is much less likely to not run than it is for you to forget to move that tape.
In our good practice / bad practice series, we will provide you with byte/bite sized pieces of advice on what we consider good (or bad) practice in MySQL-land. The topics can be just about anything, so expect random things to come up. Also, the level of advancedness greatly varies. A topic might be a no-brainer for some, a reminder for others and a revelation for a third person. We strive to tender to all of you!
PHP 5.3 is released and after the release stress is over my mind is open for new ideas. While relaxing yesterday I thought about many things, among them was the Resultset iterator I recently discussed.
Now I wondered where to go next with this and had the idea that an individual Resultset is a child of the whole result and this might be wrapped in an Recursive Iterator. For doing so we don't implement the Iterator interface but RecursiveIterator. RecursiveIterator extends a typical Iterator with two methods: hasChildren() and getChildren(). But now we have a problem: The Iterator returned by getChildren() has to be a RecursiveIterator, too, which makes sense, in general. But I want to return a MySQLi Resultset which isn't recursive - so making this a RecursiveIterator is wrong. My solution now is to introduce yet another Iterator which goes by the name of MySQLi_PseudoRecursiveResultIterator and is implemented by extending IteratorIterator which will wrap the MySQLi_Result and implements RecursiveIterator telling the caller that there are no children.
As a sidenote: In our experimental treeAndrey made MySQLi_Result an iterator but that's not yet in php.net's CVS (might need some more testing, and probably we might change the design there...) so I'm emulating this with MySQLi_Result::fetch_all() combined with an ArrayIterator, using the experimental code the constructor can be dropped.
So let's finally look at the code of these two classes:
<?php
class MySQLi_ResultsetIterator implements RecursiveIterator
{
private $mysqli;
private $counter = 0;
private $current = null;
private $rewinded = false;
public function __construct(mysqli $mysqli) {
$this->mysqli = $mysqli;
}
private function freeCurrent() {
if ($this->current) {
$this->current->free();
$this->current = null;
}
}
public function rewind() {
if ($this->rewinded) {
throw new Exception("Already rewinded");
}
$this->freeCurrent();
$this->counter = 0;
$this->rewinded = true;
}
public function valid() {
$this->current = $this->mysqli->store_result();
return (bool)$this->current;
}
public function next() {
$this->freeCurrent();
$this->counter++;
$this->mysqli->next_result();
}
public function key() {
return $this->counter;
}
public function current() {
if (!$this->current) {
throw new Exception("valid() not called");
}
return $this->current;
}
public function hasChildren() {
return true;
}
public function getChildren() {
return new MySQLi_PseudoRecursiveResultIterator($this->current);
}
}
class MySQLi_PseudoRecursiveResultIterator
extends IteratorIterator
implements RecursiveIterator
{
public function __construct(MySQLi_Result $result) {
// This ctor can be dropped with the experimental bzr sources
// as IteratorIterator::__construct() directly works with
// MySQLi_Result
parent::__construct(new ArrayIterator($result->fetch_all()));
}
public function hasChildren() {
return false;
}
public function getChildren() {
throw new Exception("This should never be called");
}
}
?>
Now we can use this code. For properly using a RecursiveIterator one should use a RecursiveIteratorIterator (RII). To get some nice labels I'm extending the RII and then have a single foreach:
<?php
class MyRecursive_IteratorIterator
extends RecursiveIteratorIterator
{
public function __construct(MySQLi $mysqli, $flags = 0) {
parent::__construct(
new Mysqli_ResultSetIterator($mysqli),
$flags | RecursiveIteratorIterator::LEAVES_ONLY);
}
public function beginChildren() {
echo "Next ResultSet:\n";
}
}
$mysqli = new MySQLi("localhost", "root", "", "test");
$query = "SELECT 1,2 UNION SELECT 3, 4;".
"SELECT 'hi world' UNION SELECT 'foobar'";
if ($mysqli->multi_query($query)) {
foreach (new MyRecursive_IteratorIterator($mysqli) as $key => $row) {
printf(" %s\n", $row[0]);
}
}
?>
Now calling this code gives us a result similar to the following:
Next ResultSet:
1
3
Next ResultSet:
hi world
foobar
Isn't that nice? - I think that's a cool API! What do you think? Do you have use cases for such an API? Should we implement this in C and bundle it with PHP? Any feedback welcome!
There are some bug reports on the auto-discovery protocol in MySQL Cluster. The idea of the auto-discovery protocol is to fetch the .frm files for the NDB tables stored in the data dictionary of the data nodes, and put them in the data directory of the mysql server.
However, sometimes (not always, which makes it more difficult to reproduce and hence fix), the auto-discovery seems to make strange things (from this bug report):
After shuting down and restoring my cluster I get the following error.
This is due to the files already being in the mysql data directory. After the error the frm does not match the data in memory this causes the following. When running select count(*) from tablename; You will get an accurate count. When running select * from table name; You get an error Can't find record in tablename.
I have seen it as well at some customers usually with bigger installations and many tables.
My current recommendation (work around) is to delete the FRM files associated with the NDB tables in the mysql server data directory before you start the mysql server(s).
So this is what i always include in my MySQL server startup scripts (and is included in the Configurator scripts):
files=`find $mysql_datadir -name "*.ndb"` for f in $files do x=`basename $f .ndb` #make sure we leave out ndb_binlog_index and ndb_schema since they are myisam tables if [ "$x" == "ndb_binlog_index" ] || [ "$x" == "ndb_schema" ] ; then echo "Ignoring $x" else y=`echo $f | sed -e 's#ndb#frm#'` rm -rf $f rm -rf $y fi done #start the mysqld here
If I want to restore data I usually:
stop the cluster
start the data nodes with --initial
stop the mysql servers (make sure they are not started)
restore the cluster data
start the mysql servers (clearing out whatever .frm files coming from the ndb tables)
Some key differences for DBAs between Oracle and MySQL database servers include:Different tools used to manage and monitor database servers.Oracle architecture is process based, MySQL architecture is thread based.Different tools used for backup and recovery.Database specific SQL syntax.Database specific SQL functions.Different syntax for stored routines. MySQL has no packages.MySQL routines are
There are different ways for starting up and shutting down a MySQL server. A MySQL server when it starts up it starts up all the way. When a MySQL server shuts down it shuts down all the way. MySQL cannot startup and and shutdown at different levels the way an Oracle database server can.
Oracle DBAs understand the importance of defining guidelines and standards and using them in the configuration and management of database servers. Years ago the Optimal Configuration Architecture (OFA) developed a base set of guidelines and naming conventions for Oracle DBAs.I developed the MySQL Optimal Configuration Architecture (MOCA) modeled after OFA to give new MySQL DBAs a set of
Storage needs to be managed for any MySQL instance. A MySQL database server does not have the same flexibility as an Oracle database server in terms of how to lay out the data storage. It is important to learn the data storage defaults and options.
An Oracle DBA needs to learn how to manage a MySQL Instance. The MySQL instance has different memory buffers and caches like any database server. Part of MySQL's uniqueness is that it has memory areas to be configured that are associated with different types of storage engines.
MySQL is an database server and anyone managing the MySQL database server has to learn the architecture, and feature functionality like any other database server. The biggest mistake to make with a new MySQL environment is thinking it is a toy and trivial to work with. A MySQL server needs to be properly configured, tuned and managed like any other database server.The key to learning any database
As MySQL continues to explode in the marketplace, there are more Oracle and SQL Server DBAs learning MySQL. For Oracle DBAs, its important to understand that MySQL is a different type of database server.Oracle is the aircraft carrier of database servers. Oracle has by far more features than any other database server. Oracle supports a number of database server solutions such as Real
We just announced the latest MySQL Workbench 5.1 release, 5.1.16. This release was labeled GA, which means that everybody using WB 5.0 or an earlier version of WB 5.1 is recommended update to WB 5.1.16.
We are now putting our main focus on the WB 5.1 branch and will continue to provide rapid update releases during the next months. Please report any hidden issues you may still find on our bug-tracker bugs.mysql.com so we can take care of those quickly.
After the dust settles we will also continue to work on WB 5.2 in parallel, to give you query & administrative functionality within a single, powerful but easy to use tool.
A short time ago I posted how I was Using XtraDB Backup to backup InnoDB. Overall, the blog post was positive, but experiences that others have had (and commented to on that blog post) have made me want to put out another short article about using XtraDB backup.
The first few points remain the same — the backup process is stable, we were able to use the binaries without compiling, and using Innobackupex as the wrapper script, analogous to Innobackup.pl.
However, we did figure out why Xtrabackup had to be run as the mysql user:
Xtrabackup writes to the data dictionary file (ibdata1, for example). We have not examined if it also writes to the data and index files (still ibdata1 by default, or the .ibd files when using innodb_file_per_table). [EDIT: The authors of Xtrabackup have commented below as to why the write occurs:
xtrabackup is kind of very small and restricted instance of InnoDB, and reuses a lot of InnoDB code.
InnoDB by default requires O_RDWR option on ibdata1 files at start, and xtrabackup therefore also did that. In the latest push to trunk it was fixed, now xtrabackup opens file with O_RDONLY flag.
When the new version is released, we will be sure to test it so that we can run the backup as a read-only user, and report back.]
On the one hand, Xtrabackup is a free tool. On the other hand, modifying InnoDB’s underlying files risks corrupting all the InnoDB tables in the system. Which is a tricky situation when it is your backup tool that might cause corruption that is beyond repair, as you do not know if you can trust your backups.
Regarding the complaint Dan R commented on the previous post that innobackupex could stream the backup to gzip, and helpfully gave the syntax. Shlomi Noach also pointed out streaming as a feature of Xtrabackup that ibbackup does not have. However, Gerry Narvaja, a co-worker noted (and commented):
I’ve been trying to install xtrabackup/innobackupex for a customer and I’m finding a few glitches, especially w/ streaming:
We use innoback(ex) wrapped in our own scripts to allow for rotation and other operations. We also use ZRM for some installations, so this would apply to integration with this tool as well. These are the glitches I found:
1. Using streaming by piping it into ‘gzip’ masks the return code from innobackupex.pl. Since gzip will almost always return 0, you can’t rely on it to determine backup success.
2. The next alternative would be to review the innobackupex.pl’s output for the OK at the end. But since it redirects the output to ’stderr’ to allow for streaming, you need to add “2> innobackupex.log” before piping and grep for the “OK” at the end.
and noted that there were some limitations:
innobackupex script is limited in the type of options you can specify compared to what the xtrabackup executable supports. I find this annoying since it limits the directories where you can have the backups, data directories and logs.
Xtrabackup doesn’t work for MySQL v4.1. In the Percona forums there was a suggestion that the 5.0 patch should work. This is true, but xtrabackup.c has other dependencies on 5.x definitions and structures I didn’t have time to review. Baron Schwartz correctly suggested that these dependencies might be trivial in a tweet directed to myself. I’ll post my findings to the Percona forums and hopefully we can soon have a patched version.
MySQL Workbench Version 5.1.16 the General Availability build can be downloaded at http://dev.mysql.com/downloads/workbench/5.1.html.Congratulations to the Workbench team for their great effort in this release.
We’re proud to announce the first General Availability build of our database development tool MySQL Workbench. Version 5.1.16 is now available on our download-servers. On our final run towards GA release of version 5.1 we managed to close 13 bugs and a couple of small issues still found in our previous release. Details on latest changes can be found on our Releases Page. Starting with this release, we recommend to switch from your Workbench 5.0 installation right to the new release (if you haven’t yet) - 5.1 is now our current stable release.
The files for different platforms have been pushed to our main server and will be globally available on our mirrors during the next hours. Please get your copy form our Download page:
If you encounter any problems, please report them to our bugtracker. Feel free to contact us any time on IRC (irc.freenode.net, #workbench) for any questions or infos.
Earlier I reported about two crashes related to MySQL 5.0.22 on Ubuntu 6.06 LTS.
I think those bugs show a lack of testing on the side of Cannonical/Ubuntu. And for MySQL there is a quite good test suite available, so it's not rocketsience.
There are multiple reasons why you could use the MySQL Test Framework: 1. Test if bug you previously experienced exists in the version you are using or planning to use. 2. Test if configuration changes have a good or bad result on the stability of mysqld. 3. Test if important functions still return the correct results (especially importand for financial systems)
There is a utility for MySQL Cluster called ndb_cpcd which is a test utility for development. It controls processes and listens on a TCP port. Is has some overlap with the MySQL Instance Manager. But there is very few documentation available about it and I wonder if anyone is actually using this. And the other test utility for NDB is ndb_test_platform. I could not find any documentation about it ndb_test_platform, so i guess I'm not supposed to use it?
I see this benchmark being quoted in multiple places, and there I see stuff like:
When carrying out more database benchmarking, but this time with PostgreSQL, XFS and Btrfs were too slow to even complete this test, even when it had been running for more than an hour for a single run. Between EXT3, EXT4, and NILFS2, the fastest file-system was EXT3 and then its successor, EXT4, was slightly behind that. Far behind the position of EXT4 were NILFS2 and then Btrfs and XFS.
There were few other benchmarks, e.g. SQLite showed ‘bad performance’ on XFS and Btrfs.
*clear throat*
Dear benchmarkers, don’t compare apples and oranges. If you see differences between benchmarks, do some very very tiny research, and use some intellect, that you, as primates, do have. If database tests are slowest on filesystems created by Oracle (who know some stuff about systems in general) or SGI (who, despite giving away their campus to Google, still have lots of expertise in the field), that can indicate, that your tests are probably flawed somewhere, at least for that test domain.
Now, probably you’ve heard about such thing as ‘data consistency’. That is something what database stack tries to ensure, sometimes at higher costs, like not trusting volatile caches, enforcing certain write orders, depending on acknowledgements by underlying hardware.
So, in this case it wasn’t “benchmarking file systems”, it was simply, benchmarking “consistency” against “no consistency”. But don’t worry, most benchmarks have such flaws – getting numbers but not understanding them makes results much more interesting, right?
The second Release Candidate of PBXT, version 1.0.08, has just been released.
As I have mentioned in my previous blogs (here and here), I did a lot to improve performance for this version.
At the same time I am confident that this release is stable as we now have a large number of tests, including functionality, concurrency and crash recovery. But even more important, the number of users of PBXT has increased significantly since the last RC release, and that is the best test for an engine.
So there has never been a better time to try out PBXT! :)
You can download the source code, and selected binaries from here: primebase.org/download.
Vladimir and I have made a lot of changes, for details checkout the release notes.
There is also a new PBXT mailing list, so if you have any questions this is the best place for them.
PBXT is a high-performance, MVCC-based, transactional storage engine for MySQL. The project is open source (GPL) and hosted on Launchpad. PBXT supports referential integrity, row-level locking and is fully ACID compliant.
For more information please go to the PBXT home at: primebase.org.
So, since there is Firefox 3.5 and since there are dolphins (mascot of MySQL of course), lets look for a relation between Firefox 3.5 and MySQL which not everybody necessarily knows yet.
If you have delete-intensive workloads on InnoDB, then you need to understand how purge works. Dimitri has an interesting post on this. And I wrote about measuring purge lag.
I haven't had to deal with this problem yet, but the insert benchmark has a new option to make it delete intensive. So, I think I can reproduce workloads that generate a lot of purge lag.
ScaleDB provides a pluggable storage engine for MySQL that delivers shared-disk clustering . Brian Akers once described ScaleDB as "the closest thing to Oracle RAC for MySQL." The ScaleDB storage engine turns MySQL into a clustered database, where all of the nodes share the same data. It eliminates the need to partition the data. It also allows you to add and remove nodes without interrupting the application. It will (in time) provide high-availability, but we can’t promise that for the beta version.
So we’re looking for companies with problems we can uniquely solve. We view the beta process as an investment on both sides, we invest in supporting you and making you successful and you invest your time using our software. I have found that beta testers will invest more time when you solve a problem that they cannot easily solve with alternatives.
Our shared-disk clustering and Multi-Table Indexing technologies can make a huge difference in the performance, scalability, while dramatically reducing the time and effort it takes to set-up and maintain a database cluster. Some of the problems we are really good at solving are:
1. Data Partitioning Challenges : We all know that no matter how you partition your data, there are performance pros and cons. There is no silver bullet in partitioning. Our solution is DON’T PARTITION. With shared-disk databases, it is like a trough where all of the nodes feed on the same data.
2. Large Data : Large data, large tables, we can handle them.
3. Complex Relationships : Addresses the performance impact of exploring complex (parent-child) relationships.
4. Joins : complex (multi-table) joins. Our multi-table index delivers single table performance across multi-table joins.
5. Indexing Large Keys : If you are indexing large keys, like URLs, we can help. Our index is (a) very compressed; (b) independent of key length.
6. Large Indexes : Our index is very compressed, so we can push more of your indexes into memory, resulting in improved performance on commodity machines.
7. Dynamic Scalability : If you need to add/remove database nodes on the fly, without interrupting the application, shared-disk is the way to go. This is ideal for cloud computing . In time, we will also add high-availability, but not in the beta version.
Pros
* ACID-transaction compliance
* Shared-disk architecture eliminates the need to partition data
* Dynamic scalability enables adding/removing nodes on the fly, ideal for cloud computing
* Support for crush recovery of any node in the cluster
* Master-only cluster eliminates slave replication and promotion
* Row-level locking
* Support for foreign keys
* Support for referential integrity
* Multi-Table Index (provides the functionality of materialized views)
* Highly compressed Indexes
* Supports read committed
Cons
* No full-text or GIS index support
* No MVCC (planned for a future release)
If you have one or more of the challenges described above, and you think we can be a good solution, please contact me at mike [at] scaledb [dot] com, or complete the beta form here .
As you see MySQL is doing great in InnoDB performance improvements, so we decided to concentrate more on additional InnoDB features, which will make difference.
- Stick some InnoDB tables / indexes in buffer pool, or set priority for InnoDB tables. That means tables with bigger priority will be have more chances to stay in buffer pool then tables with lower priority. Link to blueprint https://blueprints.launchpad.net/percona-patches/+spec/lru-priority-patch
- Separate LRU list into several lists, and in this way it will allow us to emulate several buffer pool, with features to keep different tables in different buffer pools and also to decrease contention on buffer pool. Link https://blueprints.launchpad.net/percona-patches/+spec/multiple-lru-patch
- We are looking to include Waffle Grid into XtraDB releases with some additional features like caching buffer pool on SSD.
If ideas are interesting for you and you want to support them, contact us
Danfoss Electronic Controls & Sensors, one of Denmark's largest industrial companies, today announced it has selected Sun Microsystems' MySQL database to help power its air conditioning software application AKM, Adap-Kool Monitoring. The AKM software enables commercial facilities, such as grocery store staff, to monitor and control temperature levels inside store premises. Embedding MySQL into AKM will enable Danfoss to increase the number of concurrent users of the application, while maintaining a high level of availability and performance.
check-unused-keys can be invoked and used as follows:
%> check-unused-keys --help
Usage:
check-unused-keys [OPTIONS]
Options:
-d, --databases=<dbname> Comma-separated list of databases to check
-h, --help Display this message and exit
-H, --hostname=<hostname> The target MySQL server host
--ignore-databases Comma-separated list of databases to ignore
--ignore-indexes Comma-separated list of indexes to ignore
db_name.tbl_name.index_name
--ignore-tables Comma-separated list of tables to ignore
db_name.tbl_name
--options-file The options file to use
-p, --password=<password> The password of the MySQL user
-i, --port=<portnum> The port MySQL is listening on
-s, --socket=<sockfile> Use the specified mysql unix socket to connect
-t, --tables=<tables> Comma-separated list of tables to evaluate
db_name.tbl_name
--[no]summary Display summary information
-u, --username=<username> The MySQL user used to connect
-v, --verbose Increase verbosity level
-V, --version Display version information and exit
Defaults are:
ATTRIBUTE VALUE
-------------------------- ------------------
databases ALL databases
help FALSE
hostname localhost
ignore-databases No default value
ignore-indexes No default value
ignore-tables No default value
options-file ~/.my.cnf
password No default value
port 3306
socket No default value
summary TRUE
tables No Default Value
username No default value
verbose 0 (out of 2)
version FALSE
** FULLTEXT indexes are not taken into account by this patch, so be wary. And, as always, sanity check the suggestions and test thoroughly before making changes to production.
I have not got a chance to have a standard session on Velocity Conference this year, However I'm hosting the BOF. Bring your flash, IO Performance, and general performance stories and questions if you have any.
I also plan to attend the conference afternoon on Monday and mid day on Tuesday.
I will co-present in webinar on Performance Challenges and Solutions for IO Bound Workloads in MySQL. My part of the presentation will be speaking about why switching from CPU bound workload to IO bound is such an important event, how to prepare to it as well as how to keep your application performance good as the data growths.
The Brian's portion of webinar will focus on the Schooner offering as example of flash based appliance - one of solutions I mention in my presentation.
It should be interesting whenever you're interested in Schooner appliance offerings, flash or scaling MySQL in General.
Facebook opened vanity urls to the public for free and during the landrush last Saturday over a million urls were taken.
What did you get?
I agree with techcrunch that the coolest personal url taken last week is default.aspx. LOL.
I got my first name: anatoly. Just the same as I own on twitter and on linkedin.
The rush will continue on June, 28th when Facebook pages urls will become fully open. Right now pages need more than 1000 fans threshold to be able to get vanity url.
According to techcrunch Myspace reduced its staff this week by 480 people.
The bad news is that many of them are key employees and had contributed a lot to Myspace during the years. Including people from Myspace dev. platform.
We are quickly approaching 1 million users on Friendster. Going to hit 1 million users mark next week or so. It will be our 1st platform so far where we reach 1 million installs.
x2line has 7-8 applications on Friendster with over 50K users each. And still growing.
Today marks the official launch of Percona.tv. We'll be uploading technical screencasts, conference video, and anything else cool we can think up. If you've got ideas or requests, let us know and we'll do our best to accommodate!
It is a known fact that ext3 is not the most efficient file system out there and for example file removals can be painfully slow and cause a lot of random I/O. However, as it turns out, it can sometimes have a much more severe impact on the MySQL performance that it would seem. When or why?
When you run DROP TABLE there are several things that need to happen – write lock on a table so it cannot be used by any other thread, the data file(s) removal by the storage engine and of course in the end MySQL has to destroy the definition file (.frm). That's not all that happens, there is one other thing:
The entire code which removes a table is surrounded by LOCK_open mutex. The mutex is used in various places inside MySQL, but primarily when tables are being opened or closed. It means that when LOCK_open is locked, no query can be executed because they are stopped from accessing any table.
Here is when the slow file removal on the ext3 file system starts to be a pain. Deleting a 10GB file can take a few seconds and if that is a MySQL table, the mutex remains locked for all that time stalling all queries:
I tried some alternative approaches to trick MySQL into removing smaller files with DROP TABLE to minimize the effect, such as:
TRUNCATE TABLE large_table; ALTER TABLE large_table ENGINE=…; DROP TABLE large_table;
TRUNCATE TABLE large_table; OPTIMIZE TABLE large_table; DROP TABLE large_table;
Unfortunately as it turned out each of the administrative commands like ALTER TABLE or OPTIMIZE TABLE one way or another uses LOCK_open mutex when the old table files are deleted:
A better solution through MySQL internals could be to simulate the table drop by renaming the corresponding data file or files and physically removing them outside of the mutex lock. However it may not be that simple, because the actual removal is performed by the storage engine, so it's not something MySQL code can control.
This is certainly not a common situation, but may become a problem to anyone when it's the least expected (e.g. dropping old unused tables).
So, when you connect on the fly to a database your subject to a variety of issues, like when the db is not available and when the db does not have a route.
One of the main reasons why a dev may want to connect on the fly is because they have too many front ends to hold a persistent connection on the backends. Since mySQL does not use libevent, holding open threads to mySQL is much more costly. Threads == Memory.
But, that's here nor there. The main purpose of this post is to talk about how to recover from failed connections that block apache threads.
Common Failures: No route to Host Flapping NIC Locked Tables Recovering from a Crash more of the same.
My Environment: I have a bunch of webservers (200+) that all have 300 possible threads (60000 possible connections to a single DB) behind a load balancer that uses the LB least connections protocol to distribute load across all 200+ webservers.
Since PHP is used in a stateless mode (no guarentee that the same user will hit the same server), I have to have some way of telling all other apache procs for said box that a server is down. I use stateless on the fly connections, so each apache proc will test the connection.
Here in lies my most common problem. If a box in the backend dies, all apache threads will block for a predefined time-out.
In /etc/php.ini (it could be in /etc/php.d/mysql.ini) I set
mysql.connect_timeout = 5 // the default is 60
In my common_db class when connecting to a database, the connect routine returns a database handle object
if (PEAR::isError($dbh) && $delta_to_connect >= 5){
Now PRIOR to calling the database connect code I check to see if the IP is up.
$status = apc_fetch($ip); if ($status[0] === 'DEAD'){ return false; }
... do connect ...
But, there could be a variety of issues that can call false positives, like network flaps, someone blocking the db for some time etc. So, I'll allow one request .1% of the time bypass the status check to try again.
But, if your app can't connect to the DB aren't you down anyway?
Yes, although I don't have to restart all the httpds. Also each DB has a redundant pair to when returning false, the app code will try the other set of servers.
I have a variety of methods to deal with these scenarios but this is the quickest to implement.
I'd like to introduce Baby Toys application which is connected to Facebook via "Connect". Means that this version of Baby Toys is a standalone application while users need to have Facebook account in order to authorize on and use this application. See Chocolate Lovers app which has the same concept.
Still a kid at heart? Send baby toys to friends or put on your profile!
Infoworld has an article "The fight over open source 'leeches', about the failure of enterprises to contribute the changes they make to open source software back upstream to the project.
"When it comes to open source communities, individuals are much better citizens than institutions. The enlightened self-interest that causes individuals to send back bug fixes, contribute ideas for new features, and write documentation is much harder to find in institutions," Dan Woods, CTO of Evolved Media, wrote in Forbes earlier this year.
I have a very old article in my german language blog about Open Source Software and companies, and the infoworld article triggered me now to translate it:
Many companies do use open source, and that is a good thing. Most of that software is well suited to the task, and usually it is much easier to understand, more flexible and also easier to debug than their closed source equivalents. But open source requires that enterprises adjust their process with regard to changes and deployment of software in order to be successful open source users.
The fact that you can change open source software lures many companies into making changes to the software. But making the changes is just a small part of the work, and the most inexpensive one.
There are two PRIMARY directions to sort data in SQL: Ascending (ASC) and Descending DESC. When these two sort definitions are put together in a single statement a filesort is produced.
Why do we want to avoid filesorts?
Filesorts are bad. 1st they tickle a thread based buffer called sort_buffer_size. Additionally filesorts reads the data twice, unless max_length_for_sort_data limit is reached and as a result the Filesort runs slower to reduce disk I/O. If you want filesorts to run faster at the expense of the disk increase the default max_length_for_sort_data. You can read the filesort algorithm here.
So, here is an example
CREATE TABLE `ABCD` ( `A` int(10) unsigned NOT NULL default '0', `B` int(10) unsigned NOT NULL default '0', `C` int(10) unsigned NOT NULL default '0', `D` int(10) unsigned NOT NULL default '0', PRIMARY KEY (`a`,`b`,`c`,`d`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> explain SELECT * FROM ABCD WHERE a=1 AND b=1 ORDER BY c DESC, d ASC\G
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: ABCD type: ref possible_keys: PRIMARY key: PRIMARY key_len: 8 ref: const,const rows: 2 Extra: Using where; Using index; Using filesort 1 row in set (0.00 sec)
Notice the filesort? So how does one get around this filesort?
Well
Let's define some roles for columns C and D. C is the parent while D is the child.
We want all the latest parents (C) We want all the oldest children (D)
We require pagination of all the PARENTS (show 10 parents per page) so Queries like this is PRODUCED
SELECT * FROM ABCD WHERE A=? AND B=? ORDER BY C DESC explain SELECT * FROM ABCD WHERE a=1 AND b=1 ORDER BY c DESC LIMIT 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ABCD type: ref possible_keys: PRIMARY key: PRIMARY key_len: 8 ref: const,const rows: 2 Extra: Using where; Using index 1 row in set (0.00 sec)
Now
FOREACH($C_parent as $i => $c_id) {
$C_parent[$i] = SELECT SQL_CALC_FOUND_ROWS * FROM ABCD WHERE A=? AND B=? AND C=$c_id ORDER BY D ASC LIMIT 1;
}
So, we changed 1 query into 11 queries (10 parents per page) to make the page load happen faster, by getting rid of the filesort.
What 11 queries is faster then 1? Yes, for this case it is. The reason is because filesorts are SLOOOOW, they chew up a lot of limited resources and they should be avoided. I've see filesorts take close to 50-60% of the query time.
This is the translation of an article from my german language blog. It is not a literal translation, but has been amended and changed a bit to take more recent information into account.
It started out as a discussion within the german language MySQL group in USENET. There the eternal question came up why phpMyAdmin gets no love at all from the helpers and regulars in that group. My answer was:
phpMyAdmin (PMA) like many other GUI tools for MySQL has a number of limitations. For a web tool such as PMA these come from its operating principles and can hardly be changed. But let's start at the beginning:
In MYSQL the connection is a special context or scope for many things. At least the following things are part of the connection scope:
Transactions. A disconnect implies a ROLLBACK.
Locks. Transactions generate locks with writing statements or SELECT for UPDATE. Table locks are generated by LOCK TABLES. Disconnect releases the locks.
The number returned by LAST_INSERT_ID() is cached within the connection context. It is unavailable after a disconnect.
Tables created by CREATE TEMPORARY TABLE. They are deleted on disconnect.
Prepared Statements and the parsed bytecode of stored routines are kept per connection, even if that is the wrong solution from a performance POV.
@-variables. SET @bla = 10 or SELECT @bla := count(*) FROM cookie; are defining variables within the context of a connection. They are lost on disconnect.
SESSION-parameters. SET SESSION mysiam_sort_buffer_size = 1024*1024*64 or SET @@session.myisam_sort_buffer_size = 1024*1024*64 are setting configuration parameters with session scope. They are lost on disconnect.
Replication specific SET-commands such as SET TIMESTAMP or SET LAST_INSERT_ID can affect the behavior of functions such as SELECT NOW() or SELECT LAST_INSERT_ID().
I am calling all of that connection scoped state.
A client that is disconnecting in an uncontrolled way or that does not report a disconnect properly is defective in the sense that all functionality dependent on connection scoped state is unavailable. The opposite case also exists - a client that reuses existing connections may have connections that have an unclean state, which may or may not be a security risk.
Participating: Gerardo Capiel, who was recently hired to be vice president of product management for the MySpace Open Platform, Gartner analyst Ray Valdes plus my 2 cents.
"Developing for MySpace is tough because the platform is still not stable; it has many issues and bugs which remain unfixed for months, Lubarsky said via e-mail. In its current state, the platform doesn't give enough confidence to developers."
IE7 is an absolute and utterly humiliating failure. It was very sad to see it fails over time taking into account that I'm usually a Microsoft fan when it comes to products and tools. It introduced some good usability features which already existed in FF but not in IE6 because IE6 development was frozen for quiet a while. The short list of features includes tabs, Ctrl-Enter, etc. The IE team should not have touched IE engine though. The browser became unstable and buggy. I use it only due to the fact that the majority of my users are on IE7.
IE8
IE8 is more solid than IE7 at a first glance. I installed it because Microsoft has moved it to the high priority updates last week. Too early. IE8 has many issues and I was forced to uninstall it. It needs service pack ASAP and I can't recommend to install it yet. In addition to some severe connectivity issues it causes many popular sites not to work properly with it. It has severe back compatibility issue.
Twitter got some hype recently because the site has become finally more stable and succeeded to reach some critical mass of valuable users/connectors. Many celebrities have joined Twitter recently. You can follow one of them here:
I'd like to introduce Chocolate Lovers application which is connected to Facebook via "Connect". Means that this version of Chocolate Lovers is a standalone application while users need to have Facebook account in order to authorize on and use this application.
Love chocolate? Stick chocolate to your profile and send to friends!
On Chocolate Lovers users are able to send their favorite chocolates to their friends on Facebook, stick chocks to their Facebook profile, view their send/receive stats and even more.
Now Sun has been bought, and not by IBM or Cisco, but by Oracle. In the aftermath everbody is asking themselves - what happens to MySQL?
Well, firstly MySQL and InnoDB are now part of the same company. Oracle has been maintaining InnoDB pretty well in the past, and that can only improve.
Will Oracle let MySQL die and try to push their on products into the market? Hardly so. Why should Oracle do that, and if so, using what products?
Sun Microsystems, Inc. today announced MySQL™ 5.4, a new version of the world's most popular open source database, designed to deliver significant performance and scalability improvements to MySQL applications. A preview version of MySQL 5.4 is available now for download at http://www.mysql.com/5.4.
Today's announcement was made at the seventh annual MySQL Conference & Expo being held this week at the Santa Clara Convention Center. With more than 2,000 attendees, it is the world's largest event for open source database developers, DBAs, vendors and corporate IT managers.
Sun Microsystems and Oracle Corporation announced today they have entered into a definitive agreement under which Oracle will acquire Sun common stock for $9.50 per share in cash. The transaction is valued at approximately $7.4 billion, or $5.6 billion net of Sun's cash and debt.
Want to learn how open source software helped Barack Obama get elected president? Don't miss the closing keynote at the MySQL Conference & Expo 2009, "Database We Can Believe In: Stories from the Front Lines (and Server Rooms) of Barack Obama's Online Presidential Campaign."
Co-presented by Sun Microsystems and O'Reilly Media, the seventh annual MySQL Conference & Expo explores "Innovation Everywhere" April 20-23, 2009 at the Santa Clara Convention Center in Santa Clara, CA. With more than 2,000 attendees, it is the world's largest community event for open source database developers, DBAs, vendors, and corporate IT managers. For more information, please visit www.mysqlconf.com.
There are a finite set of cases where mySQL clients will hang on a connection-for a small period of time (seconds) or indefinitely. Most notably no-route to host causes a timeout to occur which in most clients are on the order of seconds to recovery.
In a web environment if a database is connected to on the fly, each connection made should take milliseconds-when the net is healthy. However, when a database server crashes a timeout for each connection takes seconds and there are cases where connections can just hang (recovery of INNODB as an example). For instance have you ever had a ssh session hang and a kill (SIGHUP..) does not work? This happens for mySQL client connections too.
I want the front ends to recover gracefully from a no-route to host, or more notably a hanged connection condition. I want to avoid that damn timeout all together on stateless connections. Why? Because if you use up all your worker httpd threads requests Fail-things crash, bosses get mad, everyone wakes up and you have a meeting about it later. So what are ways around this?
Use a Load balancer
This is cool and all, but is expensive since two or three are needed for HA purposes. In addition, this increases the complexity of managing servers, and most LBs are optimized for HTTPD traffic, not raw TCP traffic.
Use a memcache layer to keep DB state
This is cool as well, but the state server, which collects the state of your servers and updates memcache, is now a single point of failure-and a lot of time is spent to make this resilient. Not to mention all clients across Programming languages (bash scripts, Java, PHP, Python, etc) need to have access to this memcache layer and logically handle the connection the same. This is doable, but there is an easier way.
Use IRON DNS:
IRON DNS is a term I use to build an HA Resolver. IF a box fails a health check, a nagios event handler can tell IRON DNS to set the domain names IP address to 127.0.0.1 or a routable interface that produces a Connection Refused (111 instead of 110).
For an internal network, all database entries should be able to fit in memory. All resolves should take less then 2ms, and updates to DNS entries can happen in just a few seconds across your entire farm. The only drawback is if the DNS server fails to respond your site is boned. Making DNS Failure proof is rather easy-which is another post entirely.
For my solution, I am using DNS. This allows me to recover servers in a shard on different boxes without having to change code. This also allows my environment to recover from blocked I/O events that can spike load on the front-end that make requests slow.
Here are some conditions where connections take to long:
No Route to Host
INNODB recovery
Disk Fails (disk fills up)
Switch Fails / Flaps
Plug-in storage engine crashes, yet mySQL is up
DNS Resolve fails
If you have a better method, or want to advise me on flaws I should watch out for, drop a comment.
Sun Microsystems today announced that Tema Networks has selected Sun's MySQL Embedded Server to meet to its demanding database requirements. Tema Networks' Home Zone Billing (HZB) telecom solution handles over a thousand queries per second and tens of Gigabytes of data, with rapid data volume growth expected as the service is deployed to more regions.
Memcache is a fantastic Hash table-very fast and one of the great successes of Brad Fitzpatrick-who in my opinion has done more for the open social movement as an individual then anyone else. I use memcache quite extensively, now I am thinking about adding ZLIB native to compress the value of each key-much like how INNODB does with the Barracuda file format. The theory is with a CPU hit, we can store more data per memcache instance. I've talked to the Northscale guys and they love the idea. What do you think?
for PHP for instance compresses the data on the client side.
There still might be some value compressing the data on the server-side, but now I'm not as motivated.
What might be a good alternative is to compress keys into 8 byte longs in memcached automatically instead of the actual string that can be huge. To give some more detail, 8 byte longs is a 64-bit int. A string can easily be converted into a big int by bit manipulation - and the address space is huge so key conflict is effectively removed.
Cloud computing is the big thing now days, weather you are an app developer using EC2 or the Google APP engine, or a new company trying to build your own cloud product. If you are hosting, or using a platform it costs money. I hate to spend money, especially money that is for my company. If I find an idle server, I use it to 100% utilization (prior to the saturation point).
I needed to build a new application that periodically crawls a website to update various lists. Building a crawler is expensive, especially from scratch. First, you have to define the amount of lag that is allowed from the crawl copy and the real copy. Of course the Project Manager does not want any lag, all events must be caught and near real-time without overloading the source of the data-but I am not hating, it is a challenge. Next, what technology to use, what language to write the app in-what considerations are left to be defined? How does one crawl Gigs, Tera, and amount of data in a guarantee period? On top of that, how much additional hardware is this going to cost. In addition, to be a cloud it needs to have an API so app developers can set, and get consistent data in an expected period. This is a lot of freaking requirements.
Therefore, to solve this issue, I know that mySQL will store the data, but getting the data is the hard part. This is what is going to cost money, lots of it. I looked around common architectures and found that nothing would do what I wanted to do in a cost effective manner. So, I designed my own using Seti @ Home as the basis for the design.
Get to the point already Dathan:
I have turned ever user who views my applications into a collector, using idle bandwidth without knowing who is collecting the data. My user base on spare cycles will fetch a feed of my choosing, and sends that data to my servers without any personal information. Instead of using an Amazon or Google service, I have turned my user base into a cloud to service their needs.
It is rather awesome-I must say. I am able to service the needs of more than 60 million users at the cost of development time, and NO NEW HARDWARE. The cloud does not have to be a service provider-it can be the end user as long as the end user is not impacted by the requests. BTW the team that I manage is freaking awesome-they built my vision with trial and error and a hand waved spec.
Currently the system scales as long as there is enough end users. If I lose all my users then well I am boned, but to support the feeds all I need is 100K nodes at the current rate. With 60 million end nodes, I am cool.
Imagine if Google with Adsense used this install base to tell Google if the data has changed for an arbitrary web address. All it needs is a few people to hit the same url, inform Google that the web address in question has a different checksum, then at that point Google’s crawlers can go fetch it, instead of constantly crawling data that doesn't change. Google would be able to reduce overall server cost significantly, if it just knew what data has changed instead of guessing what data has changed.
By next years Velocity conferance I hope to have a full disclosure on what technology my team used, how my team get around cross domain issues, and how to compute checksums to validate the data.
PS - I designed this, with my team we made it much better and one person implemented it and owns the product from this point on.
Where I work, Merlin is an important tool for us and provides a lot of insight that other, more generic monitoring tools do not provide. We love it, and in fact love it such much that we have about 140 database agents reporting into Merlin 2.0 from about 120 different machines. That results in a data influx of about 1.2G a day without using QUAN, and in a data influx of about 6G a day using QUAN on a set of selected machines.
It completely overwhelms the Merlin data purge process, so the merlin database grows out of bounds, which is quite unfortunate because our disk space is in fact very bounded.
The immediate answer to our purge problem was to disable the merlin internal purge and with the kind help of MySQL support to create a script which generates a list of record ids to delete. These ids end up in a number of delete statements with very large WHERE ... IN (...) clauses that do the actual delete.
I will be attending Community One tomorrow and on Thursday at Marriott Marquis Hotel, New York, NY. I am especially looking forward to the announcements tomorrow which sound very interesting :)
The first day is a free event featuring:
* Cloud Platforms – Development and deployment in the cloud. * Social and Collaborative Platforms – Social networks and Web 2.0 trends. * RIAs and Scripting – Rich Internet Applications, scripting and tools. * Web Platforms – Dynamic languages, databases, and Web servers. * Server-side Platforms – SOA, tools, application servers, and databases. * Mobile Development – Mobile platforms, devices, tools and application development. * Operating Systems and Infrastructure – Operating systems and virtualization. * Free and Open – Open-source projects, business models, and trends.
The second day of the event is focused on Deep Dives with two half-day sessions on MySQL and two full-day sessions on Java and Web development. I will be attending the session, "Using Java EE and SOA to Architect and Design Robust Enterprise Applications."
Following the conference, I will be a panelist at a Cloud Computing Seminar at Microsoft office in NY. It's going to be a long but exciting day!
It will be great to catch up with old and new friends at the event.
Walking a table means, traversing each row, commonly used in building queues, fixing data, or dumping a table. I've recently ran into a problem-caused by an assumption, where walking a table was taking way to long using the method
$pos = 0; do {
$result = SELECT col FROM TABLE LIMIT $pos, 1000; $pos += 1000; } while ($result);
The assumption was since INNODB uses a cluster index, this would traverse the table using the PRIMARY key. This is not the case, its not a problem in INNODB but a bad assumption, that I fell victim to. A table scan to each $pos occurs producing a Big-O of N^2. So, when the query:
SELECT col FROM TABLE LIMIT 1000000, 1000 is executed mySQL will scan all the rows up to row position 1001000 and for each subsequent iteration.
This is SLOOOOW. IMHO since the table is sorted by the primary key, mySQL should optimize this case - but it does not and will not. So, to walk an INNODB table fast, and keep liner time or a Big-O of N an alternative is
$last_id = 0 do {
$result = SELECT col FROM TABLE USE INDEX(PRIMARY) WHERE pkey_part > $last_id LIMIT 1000 $last_id = $result[count($result) - 1]->pkey_part
}while($result);
This dumps a table very fast, almost as fast as doing a count(*) on the PRIMARY KEY.
Another method is to
SELECT col INTO OUTFILE "/dir/file.ids" FROM TABLE;
but the data is local to the database - thus the need for the application to grab data. Another draw back of this method is that the dump produces more disk IO then walking a table off of a key, slowing down access to this table.
In conclusion, even if the storage engine keeps the table order consistent like INNODB does, do not assume that LIMIT 100000, 1000 is equivalent to a file seek of position 100000, without telling the Optimizer to use an index.
Tomorrow, I'll be attending the Executive Seminar on Cloud Computing at NASDAQ MarketSite (NY). Speakers include Dr. Werner Vogels and Mårten Mickos (ex-CEO of MySQL). Big thanks to Amazon and RightScale who were able to accommmodate my RSVP even when the registration had formally closed.
I hope to be able to catch up with Mårten Mickos during the event. In case I do succeed in catching up, is there any question you want me to ask him? You can email me or post a comment.
It's funny that the event site still shows Mårten's title as "SVP of Sun Microsystems’ Database Group."
Came across an interesting post by Bret (co-founder of FriendFeed) about how FriendFeed uses MySQL to store "schema-less" data. According to the post, they weren't having issues with scaling existing features but rather they were experiencing pain when trying to add features.
Now the way they are using MySQL is interesting and bizarre at the same time. At a very high level, it seems their approach is to use a RDBMS as if it is a column-oriented database. Of course, it makes me wonder why not just use a column-oriented database? I need to read the post again in the morning (too tired right now so just gave it a quick glance).
I am very interested in hearing thoughts from my peers at Planet MySQL regarding this approach. They seem to have gone great lengths to go this route. What issues and benefits you see of this approach and whether you ever see yourself taking this route? I, for one, am not entirely convinced of this approach and whether it can really scale down the road. Also, if it was someone other than Friend feed going down that route, I might have actually lost my tempered and yelled :)
Side note: Friendfeed is growing fast, and it would have been cool if Bret was speaking at one of the three upcoming MySQL events in April.
I'll be going into detail what is Sharding, how to Shard, pitfalls of Sharding, performance/throughput gains, shard roles, and performance scaling in general. I hope to make this the most comprehensive talk to date on the subject in 45 min.
The topic is called Scaling a Widget Company. I'll detail how I setup the data layer for Rockyou. How many transactions per second Rockyou is at, what the infrastructure is comprised of, how 99.999% uptime is achieved and hopefully get into BCP which I probably will not have time to go over.
If you want me to focus on specific aspects on the subject of shard'ing let me know and I will :).
Request time is proportional to server load. If the application response time is big so will be the server's load. To reduce server load, reduce the wait time in the application's response time to serve the request. Below are some steps that I took to remove 1.8 ms overhead on every request to my web server farm.
Tools Needed: vi strace top
Use vi to look at your include path in php.init. Next use top to find which apache process is consuming the most cpu resources. Use strace -p [TOP HTTPD PROCESS] -T (-T is for deltas).
In my example the include path is /usr/share/pear:/usr/lib64/pear:.:/var/www/html/httdocs/
lstat("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000033> lstat("/usr/share", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000034> lstat("/usr/share/pear", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000034> lstat("/usr/share/pear/ams", 0x7fbfff1690) = -1 ENOENT (No such file or directory) <0.000033> open("/usr/share/pear/ams/include/FreqCapInfo.php", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000036> lstat("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000037> lstat("/usr/lib64", {st_mode=S_IFDIR|0755, st_size=65536, ...}) = 0 <0.000034> lstat("/usr/lib64/php", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000034> lstat("/usr/lib64/php/pear", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000038> lstat("/usr/lib64/php/pear/ams", 0x7fbfff1690) = -1 ENOENT (No such file or directory) <0.000037> open("/usr/lib64/php/pear/ams/include/FreqCapInfo.php", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000038> open("/var/www/html/ams/include/FreqCapInfo.php", O_RDONLY) = 24 <0.000043> fstat(24, {st_mode=S_IFREG|0775, st_size=6707, ...}) = 0 <0.000031> stat("./ams/include/FreqCapInfo.php", 0x7fbfff4778) = -1 ENOENT (No such file or directory) <0.000037> stat("/usr/share/pear/ams/include/FreqCapInfo.php", 0x7fbfff4778) = -1 ENOENT (No such file or directory) <0.000038> stat("/usr/lib64/php/pear/ams/include/FreqCapInfo.php", 0x7fbfff4778) = -1 ENOENT (No such file or directory) <0.000036> stat("/var/www/html/ams/include/FreqCapInfo.php", {st_mode=S_IFREG|0775, st_size=6707, ...}) = 0 <0.000040> close(24) = 0 <0.000033> mlock(0x552b876be0, 24) = 0 <0.000092> mlock(0x552b8df910, 10624) = 0 <0.000038> munlock(0x552b876be0, 24) = 0 <0.000036> munlock(0x552b8df910, 10624) = 0 <0.000032> mlock(0x552b876be0, 24) = 0 <0.001707> mlock(0x552b8df910, 10624) = 0 <0.000009> munlock(0x552b876be0, 24) = 0 <0.000007> munlock(0x552b8df910, 10624) = 0 <0.000007>
Looking at the strace, 15 unneeded system calls are made on every request, each request roughly takes 30 micro seconds, for a total of a few ms wasted on every request. False positives are adding overhead to ever requests since the include path is not optimized. A Bloom Filter in Shared Memory would be perfect for for this part of PHP-but that's besides the point.
So, change your include path .:/var/www/html/httdocs/:/usr/share/pear:/usr/lib64/pear
For my example I changed the include path to the above. In my environment we don't do many PEAR loads, so it makes sense to use our directories 1st.
By doing this 15 erroneous system calls have been removed. Note: If you use the php feature __autoload make sure to protect your calls with file exist or you may be doing a require once on a file that is in a different directory which is a PHP fatal Error.
Another project that I am doing introduces a new shard type. First, think of a Shard as RAID-10 on your database layer. The data is striped across N servers and mirrored for failure recovery.
Shard Types are Shards that serve specific purposes. For instance, I have an archive shard. This shard keeps data the is rarely ever read on even cheaper hardware-with slower disks and the data is naively compressed using the innodb_file_format=Barracuda option. If and/or when a person requests data that is archived, my software layer detects this case and migrates the archive data from the super crappy RAID-5 servers to the less crappy RAID-10 servers with 32 GB of RAM, all within 5-10 seconds.
Another shard type is the sub-shard. I use this terminology to define a different way of federating data for a predefined global object_id. A predefined global object_id is what you are federating by, for instance userid.
userid X maps to Shard 3
What if you had an application that is expected to take up 8TB of data? Sticking all of userid X's data on a single shard is very costly since current size of the cluster + 8TB means more servers are needed and is costly (more data, larger ranges, slower the avg query speed). So why not put it on another set of servers that have big slow'ish disks, i.e. the Sub-Shard.
Now for userid X
userid X maps to Shard 3 userid X maps to SubShard 1
Since some shard profiles do not require 100% uptime, I can run builds of mySQL that is cutting edge.
Using the Percona 5.1 build, I ran across new tweaks for XtraDB. The patch is from the Google Patch made by Mark Callaghan's Google Team. Mark's architecture is purely disk I/O bound, and from his tests, INNODB does not use I/O effectively. He added a bunch of code dealing with the I/O performance bottlenecks that innodb native has. The three new tweaks addressing the I/O bottleneck specifically are
innodb_io_capacity = 100 // If running without the patch this is the equivalent default behavior innodb_write_io_threads = 1 innodb_read_io_threads = 1
From what I gather, this means that DEFAULT INNODB uses one write thread 1 read thread and will only work on 100 pending iops at a time-which is probably why INNODB takes sooo long to shutdown since by default 90% of the buffer pool contains dirty pages.
If your servers have a large innodb_buffer_size, on the order of a few Gigs and the server has many spindles (greater then 1) then increasing these params may help your application purge dirty pages faster. Be warned; do not increase this to high. If your box has a lot of pending I/O and this new code does not have the bandwidth to flush based on your settings, the box will freeze. That means you will have to remove traffic off of the server, let the pending io finish, lower the settings and restart.
So, what is a good size to set this to?
Percona says 100*Number of spindles you have on the db server. That seems a bit high for me. I would say 100*Number of spindles / 2 - to start out with, then over time ramp it up.
Currently I am about to run 10 Shard or 20 servers with the Percona build. This will purely I/O bound load.
In my previous blog post I talk about GROUP BY and ORDER BY optimizations. A member asked a great question that I'd like to share with everyone.
But what if the query was:
SELECT c1, c2, c3, SUM(c4) FROM T WHERE c1 = ? GROUP BY c2 ORDER BY c3 DESC LIMIT 10;
That query would produce a temp table and a filesort. explain SELECT c1, c2, c3, SUM(c4) FROM column_test WHERE c1 = 1 GROUP BY c2 ORDER BY c3 DESC LIMIT 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: column_test type: ref possible_keys: c1 key: c1 key_len: 5 ref: const rows: 1 Extra: Using where; Using index; Using temporary; Using filesort 1 row in set (0.00 sec)
The reason the index is c1,c2,c3,c4
So where c1=? and the group by of c2 would use that index, but to order the data properly you would need to do
WHERE c1 = ? GROUP BY c2, c3 ORDER BY c1 DESC, c2 DESC, c3 DESC
explain SELECT c1, c2, c3, SUM(c4) FROM column_test WHERE c1 = 1 GROUP BY c2,c3 ORDER BY c1, c2\G
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: column_test type: ref possible_keys: c1 key: c1 key_len: 5 ref: const rows: 1 Extra: Using where; Using index 1 row in set (0.00 sec)
to get rid of the temp table or filesort. Filesorts and temp tables takes about 50% of the query time-so avoid these when the query is requested at a huge rate.
Also note that to get rid of the temporary table and filesort, the query changed and does not answer your question without post processing the data in PHP or some other layer.
The reason: the data is ordered by the index. For innodb the entire table is ordered by the PRIMARY key, each index has a reference to. The mysql implementation of group by does not know how to traverse and sort the data in the 1st pass of part of the key. The optimizer needs a lot of work. So, what is done from the mysql level is to automatically create a temp table and sort on that instead of using the index which it already traversed.
I believe in 5.1 that this case is being worked on in the optimizer level to get rid of this common slowdown.
When you write good SQL, that use indexes properly there is one more obstacle that can slow down your app. The mySQL optimizer. From versions 3.23 to 5.1 the optimizer has been a problem for me. In mySQL 6.0 SUN/mySQL has resources improving it.
I wrote a post detailing how to pick indexes to get the most out of mySQL here.
Here is a post about the mySQL optimizer and what you can do to speed up your SQL SELECT statements.
What I would like to share with you today, is that UPDATE and DELETE statements can also use optimizer tricks that SELECT uses. Its not documented on the mysql.com but it is possible to do something like
UPDATE [YOUR TABLE] USE INDEX(`your_index_name`) SET col='val' WHERE [columns that satisfy your index].
To see if you have problems with your UPDATE statements taking to much time I recommend 1st running INNODB as your storage engine since SHOW INNODB STATUS will indicate what index your long running UPDATE is using. This can be achieved by looking for /RECORD LOCKS space id.* index/ of SHOW INNODB STATUS.
Another indicator that you might need to force the optimizer to do the right thing is to track "Deadlock detected" from concurrent update statements; look at the where clause of your UPDATE statement.
Next solve your issues by telling mySQL to use the correct index when setting exclusive locks via the USE INDEX statement above.
Oh how I love register_shutdown_function of PHP. This bad baby will execute at the end of the scripts in call order. So, how is this useful?
Say you want to log some action, but that logging DB is on another server. Additionally you do not want to take into account that writing to the other server may break your transaction because of timeouts if logged in the middle of a transaction. Another use case: you do not want to change a bunch of code in different places to batch all logging routines up.
public function __construct($platform){ parent::__construct($platform); register_shutdown_function(array($this, 'afterProcess')); }
public function afterProcess(){
foreach($this->getDataThatChange() as $key => $values){ $insert_data[] = $values; }
I have been using register_shutdown_function for years now, especially to clean up connections at the end of script execution. Since someone asked me if this was possible in PHP, I decided to post this quick usage of a cool php method.
Sun Microsystems, Inc. today announced the general availability of MySQL™ 5.1, a major new release of the world's most popular open source database. Designed to improve performance and simplify management of large-scale database applications, the production-ready MySQL 5.1 software is available for download now at dev.mysql.com/downloads.
If you have a slave, that slave is probably running with the read-only flag set in the mysqld-section of your my.cnf. To be able to write to a read-only slave you need to be the replication SQL_THREAD or have SUPER privilege.
Since 5.0.16, it is still possible to execute CREATE TEMPORARY TABLE on a read-only slave, so CREATE TEMPORARY TABLE privilege also allows you to write to a read-only slave in a limited and controlled way.
If you want to process a lot of data in a temporary table, you are probably creating the temporary table without any indices, then INSERT ... SELECT data into it, and then ALTER TABLE ... ADD INDEX afterwards, because that is usually faster than to insert data into a table with indices. Only that you cannot ALTER TABLE a temporary table, even on a server that is not read-only - in order to run ALTER TABLE on any table, even temporary onces, you need ALTER TABLE privilege which you might not want to give out lightly.
There is no reason at all to check ALTER TABLE privilege for an alter table operation on a temporary table, because that table is visible only to your connection and cannot be shared. It is also deleted when you disconnect. In fact there is no reason at all to check permissions for temporary tables. But it is done.
Because it is done, you can grant ALTER TABLE to a single table that does exist or even to a table that does not yet exist. If you grant ALTER TABLE privilege to an existing table on a read-only server, you cannot alter that table, because the server is read-only. If you then use CREATE TEMPORARY TABLE to create a table with that name, the temporary table will shadow the existing persistent table and for your connection the persistent table will become inaccessible.
The semantics of the GRANT will change, though, and will now apply to the temporary table, which is writeable on a read-only server because it is temporary, and is alterable because of the grant which was not meant to apply to it in the first place. Problem solved: I now can ALTER TABLE my temporary table on the read-only server after I have finished my data load.
All is well? Not!
There are multiple things at work here which I consider broken:
GRANTs are applied to temporary tables. This is not making any sense at all in my book. Temporary tables are connection-local objects and they cannot have grants applied to them which were always referring to persistent objects when they were made.
Temporary tables can shadow persistent tables in the namespace of a connection. Because GRANTS are tied to objects via the objects name and not an objects UUID or another form of truly unique object-identifier, GRANTS can refer to changing objects even when the grant does not change. This feels somehow broken, as in "not properly normalized". Does RENAME TABLE edit grant tables as well? I have to check!
By granting CREATE TEMPORARY TABLE privilege to a user I am allowing that user to shadow any other object within a schema. The temporary table will then pick up any rights granted to the shadowed object for the duration of its lifetime. This cannot be good.
Sun Microsystems, Inc. today announced a major new version of its MySQL Enterprise™ subscription service to help corporate users dramatically improve the speed and uptime of their MySQL™ database applications. MySQL Enterprise subscribers now have access to Sun's new MySQL Query Analyzer tool for monitoring query performance to accurately pinpoint and correct problem SQL code -- in addition to MySQL Enterprise's regular software updates and 24x7 production technical support.
Thirty-day trial subscriptions for the new MySQL Enterprise service, including MySQL Query Analyzer, are now available free-of-charge from www.mysql.com/trials.
So the good news is that the inaugural OpenSQL Camp is going to be an awesome event with mouth watering sessions by noted experts. The bad news (for me) is that I won't be attending it, which makes me sad. I cannot leave my town because my wife can go into labor anytime now.
The session list looks great! Congratulations and thanks to Baron, Sheeri, Ronald, all the sponsors and contributors for organizing the first OpenSQL Camp.
I will be watching PlanetMySQL closely for juicy blog posts. Hopefully, the one and only Sheeri is taking her camcorder!
Today I discovered Stack Overflow, a collaborative site that focuses on technical Questions. You can ask questions related to any language, apparently without having to register. The site is currently in beta. There are also a few MySQL questions that are currently unanswered.
I am currently tasked with writing Software Requirements Specification (SRS) document for a project. Effective sharding (based on specific criterion) and Scalability are key requirements of the project.
Scalability is traditionally classified as a non-functional requirement. My question to the community is that if scalability is crucial to a project, would it still be classified as a non-functional requirement? Are their cases when scalability requirements would be best classified as functional requirements?
Sun Microsystems, Inc. today announced Sun Systems for MySQL™, a set of solutions designed to radically change the economics of Web service delivery for enterprise customers deploying MySQL-based Web Infrastructure on Sun servers. Proven customer deployments have shown the ability to improve performance by 300 percent, reduce power consumption by 83 percent and offer up to 10 times better price/performance, with higher system reliability and faster time to market.
To view a video of John Fowler, executive vice president of Sun's Systems Group, and Marten Mickos, senior vice president of Sun's Database Group, discussing the new solutions, please visit: www.sun.com/mysqlsystems.
I wrote an application that is able to send out 3-8 million messages an hour with only 10 CPU's. This application is a part of an Offline Task system that scales linearly.
How is this done, I'll go into detail hopefully at the mySQL conference if they accept my proposal.
The scope of this blog post is to go over building the "Task Queues". Currently I have 13 Queues, one queue for each Shard that I run. The data is federated by user or randomly federated with a GUID that lives as long as the job. A request came in to add 20 million jobs to the queue all at once. The problem is with this list, will I cause deadlocks in innodb as I add the jobs to the queue as one transaction? Can live traffic still write to this queue?
To verify that Deadlocks will not occur - having an understanding about how locks work in INNODB is key. I suggest reading this page.
To build the queue I dumped the data source by
SELECT identifier, 16 INTO OUTFILE "/data/mysql/BuildQueue.log" FROM SOURCE_TABLE WHERE CONDITION.
The isolation level is REPEATABLE-READ; I'm setting a shared lock to get the most current version of the data. Writes are not blocked.
Next:
I create a table on each shard where the queue is located.
CREATE TABLE IF NOT EXISTS OfflineTasksHold ( object_id bigint(20) NOT NULL DEFAULT 0, object_type smallint unsigned NOT NULL DEFAULT 0, PRIMARY KEY(object_id,object_type) ) ENGINE=INNODB;
Then I issue a command on each Shard
LOAD DATA INFILE "/data/mysql/BuildQueue.log" INSERT INTO TABLE OfflineTasksHold; START TRANSACTION; INSERT INTO OfflineTasks (object_id, object_type) SELECT * FROM OfflineTasksHold; COMMIT;
Each shard is getting around 4-5 million rows, while accepting real-time traffic of 20-60 tasks a second to the OfflineTasks table. The OfflineTasksHold table does not have any real-time requirements and is solely used to keep the queue in Primary Key order, plus there is a CHANCE that LOAD DATA could set an exclusive lock on the OfflineTasks table-shutting down adding data to the table by the live site. The INSERT sets an exclusive lock on the rows that are being added, so the Offline Task Sheppard - the process that pops tasks off the queue is blocked for a small period - which is acceptable. Why are they blocked? Well, the massive insert sets an Exclusive Lock. The Sheppard is trying to grab the rows that are locked waiting up to 50 seconds, until innodb_wait_timeout is reached. This condition is acceptable. All other inserts are able to go into the queue without a slow down.
In summary, the job queue is built and can be automated with confidence knowing that death to the various app will not occur. Processing slows down for a bit but speed right back up.
In this program, featured experts from Gartner and Sun Microsystems discuss the adoption and best practices of open source databases in the enterprise. You'll also learn how Zappos.com, a highly successful e-commerce site, saves over one million dollars a year and stays on top of its explosive growth by subscribing to Sun's MySQL Enterprise database subscription service.
Today's "commercial quality disks" are amazing but they follow the same limitations as yesterday's disks. mySQL scales very well, but disks do not. So if you're IO bound when will your expectation of speed fail?
Test setup:
DELL 2950 PERC-6 HWRaid BBC 6 DISK 15K RPM 3.5" RAID-10 256K stripe across two channels-using WRITE THROUGH CACHE on mkfs.ext3 -T largefile4 Linux Filesystem.
The theory is that the outer part of the spindles is the fastest, and the inner portion is slower - since the outer is where the data starts (thanks for the info Benjamin Schweizer). Thus one can conclude that the more disk space your application(s) use the slower the throughput, since the heads have to move more. Brad F. my co-worker did a benchmark to prove this. Our goal is to find out at what is the saturation point if our expectation is to have 22 MB / sec of random access.
Why do we want 22MB / sec of random access throughput? We want to guarantee a certain level of performance when adding new apps to a common backend-which is I/O bound: we need to know when things will break.
Here is what Brad found: Total disk size for our RAID-10 setup =~ 800G. What point does it FAIL to achieve our expectations of sustained 22MB/s?
rndrw test across 100G test / 750G LV =~ 35 MB/s # outer part of the spindles rndrw test across 100G test / 300G LV =~ 32 MB/s # outer part of the spindles rndrw test across 250G test / 300G LV =~ 24 MB/s # sweet spot rndrw test across 350G test / 384G LV =~ 21 MB/s # saturation point rndrw test across 750G test / 800G LV =~ 14 MB/s # waste of space
In conclusion these test show that even though a RAID-10 setup with 800G of space is available, the expected performance drops when data exceeds the sweet spot of 250G-300G of 800G usable-data array.
Disclaimers: There are many factors that can raise or lower the bar, like different file systems, different I/O schedulers, flushing. For my setups I like
Deadline I/O scheduler 256K Stripe few inodes (don't need them) ext3 since that’s what stable and available.
Tomorrow morning I am presenting a session, Startup Scalability Strategies, at Startonomics, a conference being organized by Dave McClure and Deal Maker Media. The sessions will be streamed live using UStream. Check the Startonomics website at http://startonomics.com for more details.
Let me first start of with the disclaimer, that I do not use memcache to scale, I use it to reduce latency. I'm of firm belief that the database layer should be able to handle the requests, while memcache is used to keep frequent requests returning in a consistent time frame i.e. reduce I/O spikes.
Capacity planning is key to making sure your site can serve the requests to users. If the site is slow, or down that is loss revenue in any revenue model used to monetize your product.
How to determine when you need to add more memcache servers.
The stats I look at are system stats and memcache stats.
Memcache is Memory / network heavy. CPU spikes are very low, and if the CPU starts maxing out that is probably due to some sort of network driver issue or huge context switching or large values stored in memcache.
This server dedicated to memcache. The context switching is huge due to all of the constant requests-but we are talking about modern day CPU's which can context switch like crazy. The thing that is bugging me is that requests are starting to go into the run queue, not at a alarming rate but still this is an indication of some possible issue.
This is something that is graphed on ganglia. If the run queue on average starts increasing, there is some problem.
Next stats from memcache.
/** * @desc get extended status from all servers */ public function CacheGetStats(){
if ($GLOBALS[cfg][disable_feature_memcache]){
return true;
}
return $this->memcache_obj->getExtendedStats(); }
I have a class called Cache which is a wrapper around memcache class calls. Cal Henderson would kill me if I was using classes at Flickr. Don't get me wrong I agree with Cal 100% but the environment I am in now requires classes-so I have to use it. The reason why we don't like classes is for another post.
Notice on this server we have a good hit rate and no evictions. Yet looking at one server is not good enough, look at them all- the reason more memcache servers means more memory to store data for your application. The CRC32 hash that the PHP memcache client uses is not very even and some keys may be requested more.
Take a look at this server. The evictions are high, indicating that memcache needs to make room for new objects. This is not good, its an indication that the LRU is evicting objects out faster then their expire time. Additionally the memcache gets are much greater then the hits. This is an indication that memcache is not really working as good as it can.
But one server is not an indication that there is a problem. Looking at the system as a whole is to determine if a problem exists. My rule of thumb is if the 30-40% of the servers have a high eviction rate, its time to add 30-40% more servers or memory.
Now allot of this can be tuned by changing the slab size, but learning from John Allspaw, don't make a plan based on a possible gain, make a plan based on the current usage. Then if the possible gain works your golden.
How do you base your stats on adding more memcache servers?
I had a lot of good experience using fabforce's DB designer. Now that I upgraded to mySQL workbench, I have nothing but good things to say.
In 5 mins I was up and running after install. I took my mysqldump of my schema
mysqldump -d --no-add-drop-tables
Clicked File->Import->Reverse Engineer MySQL Script
Tada a picture of my schema.
Pictures are always nice to represent what your conveying to individuals who may not be as experienced with the db structure as yourself. Just for this feature alone I recommend using Work Bench.
I attended several events in June of this year including Graphing Social Patterns East, Velocity and Structure 08. At each of these events, I tried to take some notes and posted them to my personal blog. I received a few pings from readers of this blog to point them to a list of these posts. It took some time but here is the list of my notes. In some cases, I have linked directly to the presentation files.
Working the Clouds: NextGen Infrastructure for New Entrepreneurs: This panel on cloud computing featured panelists including Geva Perry (GigaSpaces), Jason Hoffman (Joyent), Tony Lucas (XCalibre), Lew Moorman (Rackspace), Christophe Bisciglia (Google / AppEngine) and Joe Weinman (AT&T). Christophe got grilled heavily by other panelists but he handled it pretty darn well.
Werner Vogels: Keynote at Structure 08: Dr. Vogels keynote was one of the highlights of Structure 08. He presented case study of Animoto and talked about the 70/30 switch among other things.
The Platform Revolution: A Look into disruptive technologies: Jonathan Yarmis AMR Research (VP of Disruptive Technologies) talked about technology trends, social networking, mobility, mobile , cloud computing, stream computing, business models, user 2.0 and the new enterprise reality.
Green Data Centers: Bill Coleman (Cassatt Corporation) presented this session. Bill is known for being responsible for the B in BEA. (Presentation slides)
Creating Bebo Applications: Bebo is now part of OpenSocial and this presentation presented at Graphing Social Patterns talks about how to create applications for Bebo.
Open Social and Google App Engine: Patrick Chanezon (API Evangelist) and Paul McDonald (Product Manager for Google App Engine) presented a technical overview of OpenSocial and Google App Engine at Graphing Social Patterns East.
OpenSocial: Open for Business: In this session, panelists were Patrick Chanezon (Google), Paul Lindner (hi5), Max Newbould (MySpace) and Sachin Rekhi (imeem). (Also see)
Mobile Social Networks: A Comparison: Benjamin Joffe's excellent eye opening session for anyone interested in using mobile platform for creating social networking solutions.
Top 5 Things that fail and win on social networks: Dave McClure, chair of Graphing Social Patterns, presented this concise but every effective presentation on what fails and what wins on social networks.
Widget Strategies & Social Platforms: Hooman Radfar, CEO of Clearspring Technologies discussed the new role of widgets and how to go about creating them.
Scaling MySQL - powered Web Sites by Sharding and Replication: Slides from Peter Zaitsev's session at Velocity. (Presentation slides)
Capacity Management: John Allspaw's signature presentation on capacity management. John also has a book coming out on this topic. (Presentation slides)
LinkedIn Communication Architecture: Slides about LinkedIn's platform built in Java. (Presentation slides)
SOX Compliance: A presentation by Skye Rogers. I missed this presentation but then caught up with Skye at dinner. I wish Skye would have received more time to discuss SOX Compliance. (Presentation slides)
There were several sessions I didn't get to go to which is a sad thing. You may want to check the conference websites directly (linked at top of this post) to see if there are presentation slides available. Also, if you took notes at these sessions, please feel free to drop the links as comments to this post.
So running mysql-5.1.24rc with INNODB-Plugin-1a I ran into "Can't initialize function 'INNODB'; Plugin initialization function failed", and noticed a more serious issue with the whole plugin architecture.
Before without plugin if the engine died so did mysqld. Now, mysqld can stay up and accept connections. But what if you have 1000 mysql instances, and all of your Nagios checks where to test if mysqld was up and accepting connections? This entire test is now invalid. I now need a Nagios check to test if the plugin-engine is up, by doing a write to the engine.
Now to the cause of the issue:
Here is my condition. 5.1.24rc with INNODB-Plugin was killed by OOM killer (mysqld memory leak?). When it came back, my expectation was for innodb to go through a recovery process, it didn't. Strange. Then I got this message from replication.
So I did a show plugin command and innodb showed that it was installed. Since I'm out of ideas I thought that I would unload the various INNODB plugins.
Tada found the problem, I could not uninstall them something is wrong with the plugin. So, to make everything smooth for a reinstall of the plugin.
1. TRUNCATE mysql.plugins // I don't know if this is needed 2. restart mysql 3. INSTALL PLUGIN INNODB SONAME 'ha_innodb.so';
Now I have an error InnoDB: mmap(26600275968 bytes) failed; errno 12 InnoDB: Fatal error: cannot allocate the memory for the buffer pool
Weird
I look at whats allocated. mysqld is consuming 11GB when nothing is loaded, and the various key buffers are sized very low. So, just to test a theory I lower the innodb bufferpool from 26GB to 19GB. Since 19+11 = 30GB and I have 32GB total.
1. start mysql 2. INSTALL PLUGIN INNODB SONAME 'ha_innodb.so';
Now it hangs. But, now this looks a familiar failure, I think its now recovering, so I look in the mysql error log and indeed it is recovering!
First of all let me start off saying that I learned a lot of Capacity Planning from two people. Jozo Dujmovic, and John Allspaw-who by the way is coming out with a book.
Capacity != Performance. You may have the capacity to do a bubble sort but a bubble sort is still a bubble sort.
Really to Scale you need to know when your application will break. I have a tool set to help determine what application is producing what SQL and use that to figure out which SQL is producing the most load on the system. Some common tricks I do is put the execution path automatically as a SQL comment, then sample the FULL Processlist to build a graph on what application, function, SQL pattern is the top load.
On top of that I use Ganglia to trend the use of each mysql box. Key metrics that I use to determine capacity.
From iostat -x
I/O wait atime svctm
If the service average is trending towards 20% I/O wait I know that that is a hard-limit for my server configuration that will cause slave lag.
Jeremy Cole has a good write up and a tool for getting I/O stats that iostat itself does not expose.
If the atime (Response time) is growing, I know that the overall SAN LUNS are saturated. On top of that SAN LUNS typically have larger svctm cutting overall throughput with how MYSQL works. On a side NOTE I despise using SANs for mySQL. Why well that's another post.
Then I have ganglia configured to monitor everything for SHOW GLOBAL STATUS, but really I only look at the following
Next I take the techniques I learned from John Allspaw and build a 3rd order Polynomial and verify that my R^2 is in the 98%tile to see when I need to add more servers. So far so good. Now I have a rough idea when I need to add more servers-a capacity plan. (The techniques involved are various ratios of Users per Application per Shard, busy time, more junk like that)
Now your Architecture allows you to Scale, by ensuring a High Throughput at a low Response Time.
I personally use a architecture that I've started on since 1999-Shard'ing. Brad Fitzpatrick when building Live-Journal really made this concept popular.
With my Federation strategy I've been able to scale some of the most toughest dynamic applications linearly by just arbitrary adding more servers. It takes 5 min. to deploy new DB servers.
So, in summary to capacity plan you need to know how the system works, monitor it and trend it. To scale: your database architecture needs to meet the needs of the app. Are you read or write heavy or both? Do you have a lot of concurrency? Does your app do a lot of sorts? Does your app do a lot of ranges? Is it all of the above? Design to meet the needs, benchmark, know when it will break, and have a plan to recover before it does.
I put this bad boy into production, and its working great.
So, I had an app that was sticking data into S3 when the data was marked or qualified as warehouse data. The problem with this method was when retrieving the data from S3 it was not chunked but appended to. So pulling 100M over the wire, parsing, sorting and adding to the db would fail since the process consumed more then its allowed share of resources. Plus grab data from S3 has a very high response time, plainly put its slow.
So, since the warehouse data is prone to error-the main reason why I needed a new solution, I decided to use ZLIB compression that is native to Innodb's Barracuda build.
I use utf8 for everything, KEY_BLOCK_SIZE will define how much extra overhead the CPU will have to work, think of this as a compression level. I use 4K since some rough benchmarks I've done shows that this is the sweet spot for my INSERT rate + Select rate.
Finally I stick the entire dataset on a slow RAID-5 array. I use RAID-5 because when I want the data-I want to get it fast and put it into the production dataset real-time. I need the disk space and redundancy. The writes are not extraordinary and not very expensive from a I/O perspective. I'm getting 60% reduction in the INNODB table space. On top of that the format of the data is the same format as the production system. So I can pull this data fast write to the shard in a single httpd process. I can migrate archive data to production quality in less then a second for every 3000 rows. I qualify this as a win. We'll see how stable this is over the next few weeks.
So in conclusion, I've replaced S3 with a cheaper solution, faster solution, I put archived user data on cheap servers, and as a result the price per user goes down, since my capacity plan is based off a function of disk growth-on my fast disk cluster. Next I can store my backups on S3 instead of semi-live data :)
I get this question allot, and my response is "it depends", people rarely like this general response. To pick a storage engine, the user must understand the gains and losses of using the storage engine, basically understand what INNODB and MYISAM do, and how they behave under different load patterns.
When picking a storage engine I go through a checklist-below is a quick list to get an idea of the thought process.
Do you require transactions?
If yes use INNODB else you may still want to use INNODB?
Are you doing a lot of big queries that Scan 20-30% of the rows?
If yes use MYISAM. It's better at doing large queries where the query requires a full table lock. INNODB will lock each row as it scans through it which hurts query throughput.
Are you building an app to store 1 row and access said row really fast and at a high concurrency?
See Tickets explanation of the example of an application that does this. Innodb hits a lock bottle neck when operating on the same row from many different threads, so it may not be advisable to use INNODB.
Are you building an app that stores a lot of blob data?
Now this is a grey area. With INNODB native zlib compression in 5.1+ INNODB may be a good choice over myISAM, while myISAM historically uses less diskpace then its INNODB counter-part. So, don't be afraid to create a single INNODB table and 100s of MYISAM tables that hold the blob data for INNODB.
Why do this? Because a single table in MYISAM will likely lock slowing down blob insertion / removal.
Do you require a lot of reads and writes where the ratio is not 90% reads 20% writes but more like 60/40?
Use INNODB over myISAM. If you don't believe me take a look at
show global status like 'table_locks_waited'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | Table_locks_waited | 82721 | +--------------------+-------+
These are generic common questions that personally help me find out whats best to use. In reality to pick the correct storage engine, you should experiment and find out whats best for your app. Understand the storage engine-do some testing, then its easy to pick which is best.
“Funny how Amazon doesn't use S3 to store any assets for amazon.com”tweet by @gruber
Amazon's S3 suffered a major outage today knocking many websites offline. S3 outage started at approximately 12:00 PM EST and the last time I checked at 11:11PM EST, Smugmug, a popular photo hosting site that extensively uses S3, was still down.
Currently, there are several great opportunities with exciting companies available in the New York area. If you're a rock star Java/PHP/Ruby developer or a pixel-obsessed designer, contact me at your earliest convenience.
Web Developer:
Give Real is a well-funded startup in the midst of an exciting period of growth and success. Our technology uses a patent pending platform that combines the ubiquity of credit card transactions and the power of social networks to create a new gifting experience.
Our primary platform is Rails, but there are programming challenges that range from SOAP APIs to Facebook application development. We are searching for full-time developers with expertise and broad experience in:
* Ruby on Rails (we also use rSpec, Starling, Memcache) * MySQL * xHTML & CSS, and comfort with Javascript * Team development with tools like Git & Trac
In addition, we are also interested in candidates who have:
* Expert Javascript skills * Java & SOAP experience * Experience scaling with Rails, or any other web platform * Comprehensive Linux knowledge * UI and graphic design backgrounds
We are willing to pay top-notch developers very competitively (plus the possibility of options) to join our team and help write code that will be used by hundreds of thousands of users within a few months. We are ideally located in downtown Manhattan less than a minute walk from the BDFV and NRQW lines
Also, if you know someone who may be a good fit for us (developer or graphic designer), we are offering a $1000 referral reward for anyone we hire.
Please contact us at jobs@givereal.com
Graphic Design:
Give Real is a well-funded startup in the midst of an exciting period of growth and success. Our technology uses a patent pending platform that combines the ubiquity of credit card transactions and the power of social networks to create a new gifting experience.
We're searching for full-time designers with experience in:
* Design for advertisements * Design for consumer focused websites & applications * xHTML & CSS coding * HTML & design for emails * Working on top of an MVC or template system (we use Rails)
In addition, we are also interested in candidates who have:
* Team development with tools like Git & Trac * Comfort with Javascript programming * Rails programming experience
We are willing to pay top-notch developers very competitively (plus the possibility of options) to join our team and help design the look and feel of a service that will be used by hundreds of thousands of users with a few months.
Also, if you know someone who may be a good fit for us (RoR developer or graphic designer), we are offering a $1000 referral reward for anyone we hire.
Let's assume you have a 512MB table, and you decide to alter the table to add an index to make queries faster.
How long would you expect this alter to take? Hours? Days?
Even with 7200 RPM-slow disks the alter should of finished in less then 1/2 hour.
I ran across an alter that was running for 4 days-on 512MB datasize. The reason why it ran so long is because there was a SELECT that was running preventing mySQL from performing "rename table", the last leg of the ALTER TABLE process.
Killing that SELECT released the shared lock allowing the alter to finish.
Do not KILL THE ALTER when stuck in this SHARED LOCK STATE; Do NOT then remove the temporary tablespace file "#sql-320f_106f99a2.*".
What will happen if you do remove the #sql* file by hand?
Well for one INNODB will crash the mysql instance saying it could not find the temporary table space. It failed to open it. Then on recovery the original table gets unlinked from the filesystem and you just lost all data for that tablespace.
Why?
Here are roughly the order of events for an alter:
Lock all writes from said table Make a temporary table #sql - file Copy all data from the old file to the new file Do a quick consistency check between the two files unlink the old file rename the temp file into the old file name
Each step operates on the data dictionary pointers for the two tables. Issuing a filesystem rm command for the step right before unlink, will cause INNODB to crash and on recovery unlink the old file and of course fail on the rename.
Please help save Ivan, son of Andrii Nikitin (MySQL Support Engineer), who needs a bone marrow transplant. Andrii's message is below:
"My family got bad news - doctors said allogenic bone marrow transplantation is the only chance for my son Ivan.
"8 months of heavy and expensive immune suppression brought some positive results so we hoped that recovering is just question of time.
"Ivan is very brave boy - not every human meets so much suffering during whole life, like Ivan already met in his 2,5 years. But long road is still in front of us to get full recover - we are ready to come it through.
"Ukrainian clinics have no technical possibility to do such complex operation, so we need 150-250K EUR for Israel or European or US clinic. The final decision will be made considering amount we able to find. Perhaps my family is able to get ~60% of that by selling the flat where parents leave and some other goods, but we still require external help."
-- Andrii Nikitin, MySQL Engineer
Please remember, every little bit will help the family pay for Ivan's operation! Be as generous as you can.
Today at 1PM EST I am presenting the second part of memcached for MySQL webinar. I was told that the registration numbers look as good as the previous one. This one will be a bit more technical than the previous webinar. Sorry for the late notice but hope you can join!
Now some details that are usually missed. GROUP BY does sorting unless you tell mysql not to. GROUP BY has two optimization methods, loose index scan, and tight index scan.
Loose index scan, scans the entire table index, while tight index scan uses some sort of constraint. For large datasets that are accessed often and require some sort of group by, tight index scans are better.
So how to pick columns to create the optimal indexes. Here is a list of practices (rules) that I personally follow:
1. What is the question asking? 2. What current indexes are on the table? 3. Can the query be re-written to use an existing index? 4. What is the overhead of adding a new index? 5. Follow left-most prefix rules 6. Build indexes that remove filesorts and temporary tables, for all query types. 7. Use statements that reduce the rows examined, and keep each index small. Note: each secondary index in INNODB requires its own page.
So lets look at a table
T(c1,c2,c3,c4) PRIMARY(c1,c2);
A question asking
SELECT c1, c2, c3, SUM(c4) FROM T WHERE c1 = ? GROUP BY c1,c2,c3 ORDER BY c3 DESC LIMIT 10;
Instinct is to add an INDEX
IDX(c1,c2,c3,c4); Since c1 is the constraint, grouped by c2,c3 SUMMING c4.
BUT when running this condition through explain you'll see that in the EXTRA column
Using where; Using temporary; Using filesort
This is bad, it means WHERE is constraining the clause but the GROUP BY and ORDER BY is producing a temporary table, and a second pass to sort the data.
So your query time is find the rows - put them in temp table which can hit disk, sort them randomly tickling a few thread based buffers. Queries such as these do not scale and can hog up memory.
Let's breakdown what the question is asking.
Give me all the results for c1 where c1 == ?, flatten the results, ORDER the results by the MAX of c3, SUM c4.
Now that we know what the question is asking, lets "re-word" the question
SELECT c1, c2, c3, SUM(c4) FROM T WHERE c1 = ? GROUP BY c2, c3 ORDER BY c3 DESC, c2 DESC LIMIT 10;
This says
Give me all the results for c1 where c1 ==?, flatten the results by the MAX of c3, SUM c4 and if c3.rowN-1 == c3.rowN the tie breaker will be c2.
This is the same asking question reworded. Since GROUP BY AND ORDER BY is sorting the MAX of c3-c2 column refers to the same c2 values as the original query output.
The IDX is now
IDX(c1,c3,c2) and explain produces an extra of
Using WHERE
This is still not good enough-since there is an additional seek to return the columns asked for. Also SUM is being done on the data and not on the IDX.
To fix this IDX should be
IDX(c1,c3,c2,c4) and now explain produces an extra of
Using WHERE; Using index.
What does Using index mean? This means that mysql will not have to do an additional seek to read the actual row.
Now group by and order by have been optimized with adding an additional key.
How did I know to re-word the question and it would produce the same result? Lets look at order by: That is the ending statement that changes the data. So, since c3 is also used to flatten the constraint I knew that c3 is all that is needed to refer to the same rows, c2 is used to be a tie-breaker when c3RowN-1 == c3RownN.
These are the sort of hints you can use to optimize SQL.
In just a few minutes, I will be leaving for Graphing Social Patterns East, a conference by Oreilly. Dave McClure of 500 Hats is the conference chair. I plan to meet old friends and make new ones. It should be a lot of fun. More about Graphing Social Patterns.
Ever wish you could have a browser based shell for Google? One that was clutter and advertising free? Say hello to Goosh, one of the coolest service to hit the web.
It even recognizes 'clear' :) For now, I am addicted to it.
Disaster is really inevitable. Even with all the redundant power investments, ThePlanet (formerly EV1 and RackShack), had to shut down their backup generators at their H1 data center on the instructions of the fire crew. This happened after a wire-short in fault transformer led to an explosion that knocked off one of their walls, ultimately bringing 9,000 servers down. Luckily no one was injured.
This just goes on to show that just because a data center has redundant power and backup generators, it does not mean that a disaster cannot happen. IIRC, ThePlanet's last disaster was blamed on backup generators not kicking off properly.
While there was no damage to servers, I wonder how many MyISAM repairs need to be triggered once the servers do come back online?
Michael Arrington of TechCrunch asks Twitter a few questions. I have only included a sample list below but you should read his blog post for all the questions:
Is it true that you only have a single master MySQL server running replication to two slaves, and the architecture doesn’t auto-switch to a hot backup when the master goes down?
Do you really have a grand total of three physical database machines that are POWERING ALL OF TWITTER?
Is it true that the only way you can keep Twitter alive is to have somebody sit there and watch it constantly, and then manually switch databases over and re-build when one of the slaves fail?
A 'yes' answer to any of these questions by Twitter would be disturbing to say the least. However, it won't be surprising as companies expect databases to just somehow magically work without creating and supporting a proper architecture. High availability doesn't comes cheap and reputation for companies is everything.
I find it amusing that Twitter isn't even looking for a DBA. May be that's considered a job for the SA over there :)
A big thank you to all those who attended the memcached webinar today on which I was a panelist. I was told that there were more than 560 registrants.
The feedback I received directly and indirectly shows that there is a lot of interest about memcached. In the future, I hope to work again with MySQL/Sun on more memcached related webinars.
If you attended the webinar and have some suggestions, comments or questions, please contact me at fmashraqi at yahoo dot com or post a comment on this blog.
Special thanks to Jimmy Guerrero, Monty Taylor, Rich Taylor, Edwin DeSouza and Alex Roedling for their hard work in arranging the webinar. Also thanks to Brian Aker, Matt Ingenthron and Trond Norbye for their assistance at various phases.
Regarding my earlier post on memcached webinar, I was informed today that more than 420 registrants have signed up. Space is limited and filling up fast so if you are interested in memcached and haven't registered yet, click on the following link to register now!
Today, many large scale web 2.0 properties (including my employer) save millions of dollars by depending on memcached to bring their application response time under control and to offload pressure from databases.
There are several success stories about using memcached to speed up database driven websites. Facebook, for instance, runs the largest memcached installation and the numbers only keep increasing. In May 2007, Facebook was reportedly running 200 dedicated servers with 3TB of memory in their memcached cluster. At the "Scaling MySQL Up or Out" Keynote, Facebook revealed they are now using 805 dedicated memcached servers. That's more than a 400% increase in less than a year!
For my employer, memcached has been a crucial component of the infrastructure that has been instrumental in handling explosive growth in a cost-efficient manner. In addition, memcached has helped us offload billions of queries from our database.
To highlight several real-life use cases of memcached (see below), I will be presenting a memcached webinar on Thursday, June 29 at 1 PM EST (10 AM PST). Monty Taylor (Senior Consultant, Sun Microsystems) and Jimmy Guerrero (Sr Product Marketing Manager, Sun Microsystems - Database Group) will also be speaking at the event. Space is limited and filling up fast (200+ registrants already) so I highly recommend registering now.
In this webinar, I will be covering several use cases for memcached including (but not limited to):
deterministic cache
non-deterministic cache
proactive cache
"state" cache
filesystem cache replacement
Hope to "see" you at the webinar. Note: This memcached webinar is not to be confused with the memcached webinar being presented by Ivan Zoratti on June 28.
So imagine you have a table and for every row it contains an INT that represents a user, a medium int that represents an activity number, and a ENUM that represents state.
CREATE TABLE UserStack ( id int unsigned not null default 0, activity medium int unsigned not null default 0, state ENUM('ready','processing','processed', 'error'), PRIMARY KEY(id), INDEX (state, activity) ) ENGINE = INNODB;
Now you have an application that is spread across many servers which pops a set of users off the stack and sets them to a processing state.
START TRANSACTION
SELECT * FROM UserStack WHERE state='ready' ORDER BY activity DESC LIMIT 10 FOR UPDATE
foreach id
Mark them Processing
If multiple threads do this at the same time; the 1st thread will get the 10 ids while 99% of the others will fail with "Deadlock Detected try restarting transaction"
The reason: from innodb's perspective many different clients are asking to perform actions on the same data all at the same time-thus a deadlock is detected. To exacerbate the problem the data is ordered differently from the cluster index-so in essence the entire table is scanned for this example. The table has 30 million rows.
I've tested transaction isolation levels:
READ COMMITED REPEATABLE READ SERIALIZED
How to get around this:
Lets add a column to make the rows unique for the calling process; lets add pid (add server int unsigned if you want to run the process from many servers).
The access pattern for the table is now going to involve pid and state. The column pid nor the combo with state is unique, so the user id which defines the user.
ALTER TABLE UserStack DROP PRIMARY KEY, ADD pid int unsigned NOT NULL DEFAULT 0, ADD PRIMARY KEY(state,pid,id), DROP INDEX state;
Then change SELECT ... FOR UPDATE to an UPDATE statement. UPDATE operations for some reason are better at concurrency then SELECT FOR UPDATE. The update scans the PRIMARY index and updates the selected amount of rows defined by the LIMIT.
UPDATE UserStack SET pid=getmypid(), state='processing' WHERE pid = 0 AND state='ready' LIMIT 10;
SELECT * FROM UserStack WHERE pid = getmypid() AND state='processing';
foreach user process mark as completed
Rinse and repeat.
The desired affect is complete. Each thread can grab its own work and each thread is guaranteed a unique user or block of users to process. The act of marking rows (marking their territory) and grabbing the marked rows takes a fraction of seconds as it should.
At the MySQL Conference and Expo, right after my participation in scaling up or scaling out keynote panel, I talked to Sun's Multimedia team about Sun and MySQL in our environment.
Recently, I found the interview on Sun's Multimedia page. The video of my discussion is embedded below:
Over the weekend I took some notes from a presentation and did some research from various sources. The result was a blog post about Internet trends that I posted on my personal blog. There are some very interesting statistics about Internet usage and social networking. Also, Facebook fans will find some interesting facts as well.
When InnoDB creates a new transaction it is not yet committed. The database has not yet made any promises to the application and so we do not really have to make anything persistent so far.
To be fast InnoDB tries to assemble the transaction in a memory buffer, the innodb_log_buffer. It should be sufficiently large that you actually can assemble such a transaction in memory without needing to write it out in part into the redo log. A size of 1M to 8M is normal.
Once a transaction is to be committed InnoDB has to read the page from disk which contains the image of the row that is being changed. It then has to actually make that change in memory. The changed page is cached in a memory pool called the innodb_buffer_pool. This pool also caches unchanged pages that have been accessed by read operations. All of these pages on disk and in memory are 16K in size and the innodb_buffer_pool_size determines how much RAM we will use as a cache for such pages - usually as much as we can spare.
InnoDB does transactions. Meaning: It collects statements working on InnoDB tables and applies them on COMMIT to all tables "at once". Either all of these statements inside one transaction succeed ("commit") or all of them fail ("rollback"), changing nothing.
By default, the database is in AUTOCOMMIT mode. Meaning: The server sees a virtual COMMIT command after each statement. You can disable autocommit completely, or you are starting an explicit transaction inside autocommit using the BEGIN statement.
MySQL offers a variety of storage engines giving you a lot of flexibility in managing your storage and data access needs. Still I encounter customers who are not using this flexibility when they should, because they lack information about the advantages of Non-MyISAM storage engines or which are using storage engines like InnoDB as if they were using MyISAM.
This is the introductory article in a series of texts that will hopefully once become an InnoDB tutorial. A german version of this article is available in my german language blog.
An InnoDB Tutorial
The InnoDB storage engine is an engine that can be operated ACID compliant, does transactions and foreign key constraints. It is useful for all applications that do online transaction processing or have a high rate of concurrent write accesses for other reasons.
PHP is a nice programming language for web applications with a large number of databases supported.
PDO is one of many database access abstractions trying to unify the way PHP talks to databases. Unlike most of these, PDO is written in C and not in PHP.
Wez Furlong wants to improve PDO "Version 2". In particular he and others want to produce an open spec that can be used by anybody including database vendors to code against, add unit testing for the spec and the drivers that claim to support it and improve metadata handling for PDO.
In order to get database vendors, notably IBM, into the game he proposes a Contributor License Agreement to make it easier for vendors to commit into the PDO part of PHP. There exist corporate and individual versions of the CLA, and the PDO license is yet another license that is different from the PHP license. Wez has prepared a FAQ covering questions regarding all that. Continue reading "PHP PDO V2 CLA"
My thinking was that the topic is already dead, but people have strange ideas off and on again. Have a look at the S9Y boards where you'll find someone who wants to have a storage backend "LDAP" for S9Y. This is sick and wrong! Let me explain why.
According to my findings in Bug #31876, MySQL does not commit data to disk in Windows using the same method MS SQL Server and DB/2 are using. The method MySQL uses appears to be seven times slower in pathological scenarios.
The bug report contains a patch - thanks to the MySQL WTF (The Windows Task Force) and the lab provided by the customer for helping me to find that.
Does this work for you? I want to hear about your test results.
Lately, I have had opportunity to evaluate a very large Ruby installation that also was growing very quickly. A lot of the work performed on site has been specific to the site, but other observations are true for the platform no matter what is being done on it. This article is about Ruby On Rails and its interaction with MySQL in general.
One of the major contributing factors to the success of MySQL is the ease and simplicity of its replication. Read-slaves for scaleout and backup-slaves for noninterrupting backups are the norm in any MySQL installation I have seen in the last two years.
So how does replication work? And how shall it be expanded in the future?
What is available?
The binlog written by MySQL currently logs all statements changing the tablespace. It is a serialization of all tablespace changes. The binlog position, expressed as (binlog name, offset), is a database global timestamp - a timestamp expressed in seconds.fraction does not work for any precision at all, because on a multi-core machine multiple things can happen concurrently.
If you want to make a consistent full backup of the database, the database must not change during the backup. That is, it must be possible to associate one and exactly one binlog position with the backup.
In fact, if you have such a backup - one associated with a binlog position - and you happen to have the binlogs from that time until now, it is possible to do a point-in-time (PIT) recovery. You'd recover from the full backup and you'd then replay the binlog from the backups binlog position until now. That is why it is important to store the binlog in a filesystem that fails independently from the rest of your MySQL. That's also why you must not filter the binlog that is written by MySQL using binlog-do-db and binlog-ignore-db - if you do, you'll get an incomplete binlog that will fail to be useful in a PIT recovery scenario.
A slave in MySQL is now nothing but a binlog downloader and executor: The slave must be restored from a PIT-capable full backup. It is then being told the current binlog position and where to log in to get the missing binlog. The slaves IO_THREAD will then log into the master server and download the binlog to the local disk as fast as possible, storing it as the relay log. The slaves SQL_THREAD will then start to execute the relay log as fast as possible. Replication can thus be thought of as an ongoing live recovery.
In MyISAM, we do have LOAD INDEX INTO CACHE. In InnoDB this does not work. For benchmarking I often require a way to preload the innodb_buffer_pool with the primary key and data after a server restart to shorten warmup phases.
According to Blackhole Specialist Kai, the following should work:
CODE:
mysql> create table t like innodbtable;
mysql> alter table t engine = blackhole;
mysql> insert into t select * from innodbtable;
Another win for the unbreakable BLACKHOLE storage engine.
Sorry for the low maintenance level on this site, personal matters have limited my available time. I plan to now come back to more active maintenance of this site, beginning with an update to the forums.
The forums now allow for registration without admin approval but have a new requirement that those registering must answer a simple question to submit their registration. This should hopefully allow new members to join without waiting for their approval, while also preventing spam registration.
When I return from the MySQL Conference I’ll be restoring the long-missing samplecode section to the site and restoring the three-parts of the Yatta tutorial.
One of the recent issues wiped out my copies of Yatta that were sent to me by you the readers. If you previously sent me Yatta, please re-send to mike@thisdomain.
During the site failure the articles and source for the Yatta series was lost, so I thought I would put a call out: do you have a local copy of the articles and source for the Yatta series? If so, please comment here or send me an email!
I didn’t have a chance to restore the full structure to vbmysqldirect’s project pages yet, but I have posted an archive of the project pages to our new Projects page.
Good news everyone! At this point most of the articles are restored. I’ll be finishing the restoration of the last two or three articles soon and moving on to a restoration of the sample code and the project pages (vbmysqldirect in particular).
Thank you for waiting so far, and if you find any errors in the restored articles please let me know.
Those of you who have been around for a while will likely know that part 4 of the tutorial had some code that became outdated when VB2005 came out of Beta. I’ve updated the code samples in the article and its zip file but lack the time right now to do a full validation due to work.
As such, I need your help. If someone out there could be so kind as to start at Part One and work through the tutorial using the current release of VB2005 to check for issues it will be much appreciated.
I’ll try to get parts 5-7 up soon and move on to the remaining article content. Then I’ll work on getting samplecode online.
Also, new content is always appreciated. If you have an article or sample you want to submit feel free to let me know and I’ll create a contributor account for you in the CMS.
I’m also looking for someone to manage the Spanish language articles and moderate the new Spanish language topic in the forums.
I’ve installed forums to http://www.vbmysql.com/forums. Because of the new hosting arrangements and the difficulty in restoring the previous forum software, I’ve changed backends. This means you’ll need to re-register and recreate any posts you were still looking for answers to.
I’ve restored two articles so far and would appreciate any feedback as to the appearance/style of the articles posted so far. Since I’m rebuilding site design as I restore articles, now is a good time to voice your opinion.