Today there was a question on the Freenode MySQL channel about a classical
problem: Rows
holding group-wise maximum of a column. This is a problem that I keep
encountering every so often, so I thought I would write up something about it.
A good example of the problem is a table like the following holding versioned
objects:
CREATE TABLE object_versions (
object_id INT NOT NULL,
version INT NOT NULL,
data VARCHAR(1000),
PRIMARY KEY(object_id, version)
) ENGINE=InnoDB
Now it is easy to get the latest version for an object:
SELECT data FROM object_versions WHERE object_id = ? ORDER BY version DESC LIMIT 1
The query will even be very fast as it can use the index to directly fetch the
right row:
mysql> EXPLAIN SELECT data FROM object_versions
WHERE object_id = 42 ORDER BY version DESC LIMIT 1;
+----+-------------+-----------------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | object_versions | ref | PRIMARY | PRIMARY | 4 | const | 3 | Using where |
+----+-------------+-----------------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)
But what if we want to select the latest version of all (or some range
of) objects? This is a problem that I think standard SQL (or any SQL dialect
that I know of, including MySQL) has no satisfactory answer to.
Intuitively, the problem should be simple for the database engine to
solve. Just traverse the BTree structure of the primary key (assume InnoDB
clustered index storage here), and for each value of the first part of the
primary key (object_id), pick the highest value of the second
part (version) and return the corresponding row (this is similar
to what I believe is sometimes called index skip scan). However, this
idea is surprisingly difficult to express in SQL.
The first method suggested in the above link to the MySQL manual works in this
case, but it is not all that great in my opinion. For example, it does not work
well if the column that MAX is computed over is not unique per group (as it is
in this example with versions); it will return all of the maximal rows which
may or may not be what you wanted. And the query plan is not all that great either:
mysql> EXPLAIN SELECT data FROM object_versions o1 WHERE version =
(SELECT MAX(version) FROM object_versions o2 WHERE o1.object_id = o2.object_id);
+----+--------------------+-------+------+---------------+---------+---------+-----------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+---------+---------+-----------------------+------+-------------+
| 1 | PRIMARY | o1 | ALL | NULL | NULL | NULL | NULL | 6 | Using where |
| 2 | DEPENDENT SUBQUERY | o2 | ref | PRIMARY | PRIMARY | 4 | einstein.o1.object_id | 1 | Using index |
+----+--------------------+-------+------+---------------+---------+---------+-----------------------+------+-------------+
2 rows in set (0.00 sec)
It is apparently doing a full table scan with an index lookup for every row in the table,
which is not that bad, but certainly more expensive than necessary,
especially if there are many versions per object. Still, it is probably the
best method in most cases (or so I thought first, but see benchmarks below!).
The two other suggestions from the MySQL manual are not perfect either (though
the first one is blazingly fast, see benchmarks below). One is
to use an uncorrelated subquery with a join:
mysql> EXPLAIN SELECT o1.data FROM object_versions o1
INNER JOIN (SELECT object_id, MAX(version) AS version FROM object_versions GROUP BY object_id) o2
ON (o1.object_id = o2.object_id AND o1.version = o2.version);
+----+-------------+-----------------+--------+---------------+---------+---------+-------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+--------+---------------+---------+---------+-------------------------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 1 | PRIMARY | o1 | eq_ref | PRIMARY | PRIMARY | 8 | o2.object_id,o2.version | 1 | |
| 2 | DERIVED | object_versions | index | NULL | PRIMARY | 8 | NULL | 6 | Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+-------------------------+------+-------------+
At first, I actually did not know exactly how to interpret this plan
output. After the benchmarks given below, I now think this plan is actually
very good, apparently it is first using something like an index skip scan to
compute the MAX() in the uncorrelated subquery, and then looking
up each row using the primary key. It still has the issue with multiple rows
if version was not unique per object.
The other suggestion uses an outer self-join:
mysql> EXPLAIN SELECT o1.data FROM object_versions o1
LEFT JOIN object_versions o2 ON o1.object_id = o2.object_id AND o1.version < o2.version
WHERE o2.object_id IS NULL;
+----+-------------+-------+------+---------------+---------+---------+-----------------------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-----------------------+------+--------------------------------------+
| 1 | SIMPLE | o1 | ALL | NULL | NULL | NULL | NULL | 6 | |
| 1 | SIMPLE | o2 | ref | PRIMARY | PRIMARY | 4 | einstein.o1.object_id | 1 | Using where; Using index; Not exists |
+----+-------------+-------+------+---------------+---------+---------+-----------------------+------+--------------------------------------+
2 rows in set (0.00 sec)
The plan again looks reasonable, but not optimal. And somehow, all three
methods feel unnatural for something that ought to be simple to express.
And in fact, there is a nice way to express this in SQL, except that it does
not work (at least not in MySQL):
SELECT MAX(version, data) FROM object_versions GROUP BY object_id;
If there was just support for computing MAX() over multiple
columns like this, this query would be a nice, natural, and simple way to
express our problem. And it would be relatively easy for database engines to
create the optimal plan, I think index skip scan is fairly standard
already for single-column MAX() with GROUP BY. And
the syntax feels very natural, even though it does bend the rules somehow by a
single expression (MAX(version, data)) returning multiple
columns. I have half a mind to try to implement it myself in MySQL or Drizzle
one day ...
In fact, one can almost use this technique by an old trick-of-the-trade:
mysql> SELECT MAX(CONCAT(version, ":", data)) FROM object_versions GROUP BY object_id;
+---------------------------------+
| max(concat(version, ":", data)) |
+---------------------------------+
| 2:foo2 |
| 1:bar |
| 3:baz2 |
+---------------------------------+
3 rows in set (0.00 sec)
mysql> EXPLAIN SELECT MAX(CONCAT(version, ":", data)) FROM object_versions GROUP BY object_id;
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------+
| 1 | SIMPLE | object_versions | index | NULL | PRIMARY | 8 | NULL | 6 | |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------+
1 row in set (0.00 sec)
Though I consider this (and variations thereof) a hack with limited practical
usage.
And speaking of hacks, there is actually another way to solve the
problem, one which I learned about recently at a customer:
mysql> EXPLAIN SELECT data FROM
(SELECT * FROM object_versions ORDER BY object_id DESC, version DESC) t GROUP BY object_id;
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6 | Using temporary; Using filesort |
| 2 | DERIVED | object_versions | index | NULL | PRIMARY | 8 | NULL | 6 | |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+---------------------------------+
Get it? This cleverly/evilly (ab)uses the MySQL non-standard extension which
allows SELECT of columns not in the GROUP BY clause even without using
aggregate functions. MySQL will return a value for the column from an
"arbitrary" row in the group. In practise, it chooses it deterministically
from the first row in the group, which is why this trick seems to work well in
practise. But it is clearly documented to not be supported, so not
really something to recommend, though interesting to see.
Bonus benchmark
As a free bonus, I decided to run some quick benchmarks. As it turns out, the
results are quite surprising!
So I filled the table above with 1,000,000 rows, 1000 objects each with 1000
versions. Total table size is about 50Mb or so. I then ran each of the five
above queries:
mysql> SELECT data FROM object_versions o1
WHERE version = (SELECT MAX(version) FROM object_versions o2 WHERE o1.object_id = o2.object_id);
1000 rows in set (4 min 22.86 sec)
mysql> SELECT o1.data FROM object_versions o1
INNER JOIN (SELECT object_id, MAX(version) AS version FROM object_versions GROUP BY object_id) o2
ON (o1.object_id = o2.object_id AND o1.version = o2.version);
1000 rows in set (0.01 sec)
mysql> SELECT o1.data FROM object_versions o1
LEFT JOIN object_versions o2 ON o1.object_id = o2.object_id AND o1.version < o2.version
WHERE o2.object_id IS NULL;
1000 rows in set (2 min 42.72 sec)
mysql> SELECT MAX(CONCAT(version, ":", data)) FROM object_versions GROUP BY object_id;
1000 rows in set (0.63 sec)
mysql> SELECT data FROM
(SELECT * FROM object_versions ORDER BY object_id DESC, version DESC) t GROUP BY object_id;
1000 rows in set (15.61 sec)
The differences are huge!
The clear winner is query 2, the uncorrelated subquery. Apparently it can do
an index skip scan for the inner MAX/GROUP BY query, followed with a primary
key join, so it only ever has to touch 1000 rows. An almost optimal plan.
Query 1 and 3 (correlated subquery and outer join) are spectacularly bad. It
looks as if they are doing something like for each 1,000,000 rows in the full
table scan, do a 1000 row index range scan in the subquery/join, for a total
of 1 billion rows examined. Or something. Not good.
Query 4 and 5, the trick queries, are doing so-so, probably they get away with
a sort of a full table scan of 1,000,000 rows.
Conclusions: Uncorrelated subquery is the undisputed winner!
I write emails, blog pages, Wiki pages. I create spreadsheets, presentations and web pages. I even do some detailed formatting of photo books and I layout text in detail with graphic programs or publishing tools.
But what I find myself doing more and more seldom is writing standalone documents.
I know the world is full of .docs, and I was just reminded of that, when talking to Marino Marcich, the head of the ODF Alliance, in Ankara, Turkey. I fully symphatise with the usage of an open document format, where there is true openness, true choice and true competition between software suppliers — and not a monopoly based on format changes at the whim of individual companies.
However, my thinking is that the use of classical “Word processing” is a shrinking use case. I may be an exception, but:
I prefer getting emails where the message is in the email body, not in an attachment.
I can read the message quicker.
I can search quicker within the message, or within the message folder.
I have less data to archive.
I prefer finding and reading data over the web, rather than having to myself manage a directory tree containing documents.
I like sharing texts where I am in charge of their persistence, through pointing to it in the form of an URL.
If I want to enable my readers to read my texts offline, I just cut-and-paste a pure-text version as the email body.
In the rare case that I want to control format and positioning of texts (for binders or other texts designed for usage on dead trees only), I use a drawing program, such as Omni Graffle.
This means that although I use Open Office, I do so for .odp files (presentations) and .ods files (spreadsheets), but hardly ever .odt files (texts).
True, it’s much more annoying to get a .doc file than an .odt file, but getting any text attachment at all is usually quite a bit of overkill and an annoyance, at least for me.
I addressed this thinking with Aslam Raffee, the chairperson of the OSS Workgroup in the South African Government IT Officers Council a few weeks ago. And judging from his reaction, the standalone document isn’t quite dead yet. But if it were for myself, and many of my MySQLer colleagues, the standalone document would soon face extinction.
I’ve been using Google Reader since it was created. I really love the *idea* of Google Reader. I like that scrolling through the posts marks them as read. I like that you can toggle between list and expanded views of the posts. I like that you can search within a feed or across all feeds (though selecting multiple specific feeds would be great).
All of that said, I’d like to explore other avenues, because I don’t like that there’s, like, zero flexibility in how the Google Reader interface is configured. My problem starts with large fonts…
I use relatively large fonts. If you increase the font twice up from the default size in firefox on a mac (using the cmd-+ keystroke, twice), and you have more than just a couple of feeds, you wind up with this really horrible side pane with the bottom half of it requiring a scroll bar, and the text wraps, and it just looks terrible. What makes this really REALLY REALLY annoying is that:
I don’t use the features included in the *top* part of the side pane, ever, at all (like ‘trends’ and stuff), and
You can’t resize or disable that part of the side pane.
I’ve used folders and some other features to try to alleviate the issue, but it’s just a compromise, and I’d rather not do that if something else would work better for me. I’ve had a couple of quick glances at just a couple of other readers, but I thought I’d get some input from the lazyweb to see what your thoughts are. Is there a browser-based feed reader that has some of Google’s niceties, but perhaps with a little bit nicer/more configurable interface? Out of curiosity, are you using a Mac-compatible fat-client reader that just totally r0cks in some way? If so, let me know in the comments.
There have been plenty of blog entries and writings about the MySQL Query Analyzer, for what I think are good reasons. Labeling it a potential Killer App, causing many MySQL users to become paying Sun customers, may be a daring thing. However, the Query Analyzer might very well have what it takes. The key benefit of it is that it identifies the source of performance bottlenecks. In that sense, one could perhaps instead call it a profiler, as it analyses the set of all queries going on, as opposed to an individual one. One person to whom I described it said “ah, so it’s sort of a DTrace for MySQL queries?” — and for those who know what DTrace is, it’s not a bad analogy.
And the non-FOSS nature of the Query Analyzer is something I need to underline, in order not to set false expectations. It is part of the MySQL Enterprise subscription offering. It is not GPL software. So what this means is that you will be confronted with registrations, with trial periods and similar things familiar from other non-FOSS software, should you decide to try it out. You may agree or disagree as to whether that is right or wrong, but that is the business model we’ve chosen at Sun Microsystems (and read Zack’s blog post for some of the underlying reasoning). And while my role as VP MySQL Community is primarily focused at growing and serving the MySQL user base, I am of course also interested in Sun having a business model for MySQL that enables growth in economic terms. I’m excited to see where the Query Analyzer will take us!
So, to sum it up: You may recall various discussions on Sun wanting to serve both the group of people who “have more time than money” and those who “have more money than time“. The value of MySQL Query Analyzer just might be compelling enough for many to count themselves into the latter group.
Greg Rahn of Structured Data explained the use of a preprocessor for external tables. He writes, “Before External Tables existed in the Oracle database, loading from flat files was done via SQL*Loader. One option that some used was to have a compressed text file and load it with SQL*Loader via a named pipe. This allowed one not to have to extract the file, which could be several times the size of the compressed file. As of 11.1.0.7, a similar feature is now available for External Tables . . . ”
On Oracle Corp’s Oracle Certification blog, Paul Sorensen debuts a new booklet called “Performing an Oracle DBA 1.0 to DBA 2.0 Upgrade”: “The knowledge required by the typical Oracle DBA has increased, requiring additional skills and in some cases job responsibilities. Oracle Certified Masters Joel Goodman and Harald van Breederode discuss this shift in DBA skills in this 10-page PDF booklet . . . ”
Brian, Monty, Stewart, Lee and myself sat down yesterday and fleshed out the blueprint tasks that we are targeting for the Cirrus milestone. This marks the first time we've made a concerted effort to really determine the tasks which are of essential importance in moving towards Drizzle's first release. Before I detail some of the major tasks, I'd like to make a quick recognition of our contributor community.
Although the MySQL server does have community contributions in some of the releases, the Cirrus milestone marks something of a new day in MySQL-related development. Cirrus contains tasks which are actively being developed by external contributors. This may not sound like a huge deal, but it is.
In the past, contributions have been included in the MySQL server, however these contributions have always been included after the code has been contributed. For instance, Jeremy Cole's SHOW PROFILES patch, although heavily modified from its original submitted form, was included in MySQL Community Server after a long period of code review and modification. However, to my knowledge, the code contributor community has never been actively involved in either ongoing feature development for a release, nor actively involved in the direction in which the server is developed.
Cirrus marks a new day. Not only are tasks for Cirrus assigned to external contributors, but the decision-making and strategic power of the release is in the community's hands. The only reason a community member would not have a say in the direction of the server is if they don't speak up and share an opinion. As of this morning, there are 299 members of the drizzle-discuss mailing list. All of these members have a say in what gets done in Drizzle. This makes me a happy boy.
A Note on What a "Release" Is
Before the emails start firing off about what's in the first release of Drizzle and when it will come, I'd like to note that we are not going for a "big bang" approach to releasing software. The tasks I outline below are targets for a milestone. These tasks do not mean that the first release of Drizzle will contain all of the listed items. In fact, to be sure, some of them likely won't make it into the first release, and other tasks not listed currently for the milestone will "make it in". Although the community will eventually decide the release model, most (all?) of the developers sitting at Brian's table agree that an Ubuntu-like release model leads to more stable and consistent releases.
By "Ubuntu-like", I mean that it is the release date which is important to be kept stable, and not the list of features contained in the release. People want consistency in when to expect the next release; it makes it easy to look forward to a certain date. What is less important is what is included in the release. What counts is that each release is stable and demonstrates incremental improvements at a consistent rate. I'll be blogging more about this concept shortly and will start a discussion on the mailing list regarding possible release dates and a schedule for locking down commits before that date. Whatever is feature-complete at the time of lock-down goes into the release. Nothing more. Why? Because stability is more important. With a set release cycle, the feature that "missed the deadline" will eventually make it into the code base in a shorter amount of time, in a consistent and stable manner.
Targets for Cirrus
There are a number of major areas that Cirrus is targeting:
Many tasks in the "cleanup, reuse and refactor" category have already been completed, by Monty, Brian, myself, and community contributors such as Toru Maesaka, Patrick Galbraith, Eric Day, C.J. Collier, and Yoshinori Sano. These tasks are listed on the blueprints page starting with "code-cleanup". They are also not as dependent on each other as some of the other task areas.
Feel free to click through on the various links to the milestone and blueprint tasks in this blog post, comment on the mailing lists, and be an active contributor. Nothing is off limits.
Sun updates MySQL Enterprise. The Microsoft/Novell deal is two years old. Nuxeo and Boxee get funding. Red Hat’s CEO on open source in a downturn. Steve Ballmer as a glove puppet. And more.
Audio/visual Steve Ballmer on Yahoo Acquisition 1938enterprise The clip linked to above features a glove puppet of Steve Ballmer and some very strong language. Do not click it if you are easily offended. Do click it if you are easily amused.
Progress on the Open Source enterprise grade MySQL monitoring system; the schema for Monolith version 2 has been designed. Due to the many suggestions for features and the interest it has received I’ve put this on the front burner. That said, here is some more info on the next steps I’ll be taking.
Monitored servers will use a command line agent (called remotely) to pull information from both MySQL and the OS.
Historical information will include all values from global status and global variables, as well as CPU/Memory/IO/Disk usage.
Standard graphing functions - the ones in the list from the previous post - will gather information from various view tables that contain historical data that is collected from the agent script. User defined graphing will allow you to look at historical values over time for any of the various global status or global variable settings (integer based ones anyway).
The cnf file for each server will be stored in the monitoring database and available for viewing for historical purposes.
It almost never crashes for me now, but you might not be able to make the same claim. Well, if you use MyISAM rather than InnoDB you might be able to make the claim, but in that case you really need it to not crash as MyISAM might not recover to a transaction consistent state. The fix is not in the official release. This is almost a repeat of a recent post, but one thing has changed. We deployed the fix for bug 32149 and eliminated the cause of the majority of crashes from software bugs. We still get crashes on a daily basis, but hardware is the cause.
MySQL 6.0.8 has new interfaces for semisynchronous replication.
The interfaces make it possible to load components to ensure that all transactions are replicated to the slave before the users of the master gets acceptance of the commits.
The original code was developed by Mark Callaghan (Google) and Wei Li (Google). Zhenxing He (MySQL) ported the Google code to MySQL 6.0, extracted the functionality into pluggable components, and changed the code so that all storage engines are supported. Paul DuBois (MySQL) wrote the MySQL manual documentation.
The extraction of this patch into components is a first step in our strategy to make MySQL Replication modularized with separate loadable components for special replication functionality. With more interfaces to the server, it becomes easier to make the server behave in the way you want.
Please let us know what you think. Does it work as you expect? Do you want the synchronization to be different? Want to publish your own replication components for MySQL?
Peter wrote a post a while ago about choosing a good InnoDB log file size. Not to pick on Peter, but the post actually kind of talks about a lot of things and then doesn't tell you how to choose a good log file size! So I thought I'd clarify it a little.
The basic point is that your log file needs to be big enough to let InnoDB optimize its I/O, but not so big that recovery takes a long time. That much Peter covered really well. But how do you choose that size? I'll show you a rule of thumb that works pretty well.
In most cases, when people give you a formula for choosing a configuration setting, you should look at it with skepticism. But in this case you can calculate a reasonable value, believe it or not. Run these queries at your server's peak usage time:
mysql> SHOW engine innodb STATUS\G SELECT sleep(60); SHOW engine innodb STATUS\G
Log sequence number 843836410803
1 row INSET(0.06 sec)
1 row INSET(1 min 0.00 sec)
Log sequence number 843838334638
1 row INSET(0.05 sec)
Notice the log sequence number. That's the total number of bytes written to the transaction log. So, now you can see how many MB have been written to the log in one minute. (The technique I showed here works on all versions of MySQL. In 5.0 and newer, you can just watch Innodb_os_log_written from SHOW GLOBAL STATUS, too.)
As a rough rule of thumb, you can make the log big enough that it can hold at most an hour or so of logs. That's generally plenty of data for InnoDB to work with; an hour's worth is more than enough so that it can reorder the writes to use sequential I/O during the flushing and checkpointing process. At this rate, this server could use about 110 MB of logs, total. Round it up to 128 for good measure. Since there are two log files by default, divide that in half, and now you can set
Does that look surprisingly small? It might. I commonly see log file sizes in the gigabyte ranges. But that's generally a mistake. The server I used for the measurements above is a big one doing a lot of work, not a toy. Log file sizes can't be left at the default 5MB for any real workload, but they often don't need to be as big as you might think, either.
If this rule-of-thumb calculation ends up showing you that your log file size ought to be many gigabytes, well, you have a more active write workload. Perhaps you're inserting a lot of big rows or something. In this case you might want to make the log smaller so you don't end up with GB of logs. But also realize this: the recovery time depends not only on the total log file size, but the number of entries in it. If you're writing huge entries to the log, fewer log entries will fit into a given log file size, which will generally make recovery faster than you might expect with a big log.
However, most of the time when I run this calculation, I end up finding that the log file size needs to be a lot smaller than it's configured to be. In part that's because InnoDB's log entries are very compact. The other reason is that the common advice to size the logs as a fraction of the buffer pool size is just wrong.
One final note: huge buffer pools or really unusual workloads may require bigger (or smaller!) log sizes. This is where formulas break down and judgment and experience are needed. But this "rule of thumb" is generally a good sane place to start.
copy cluster/config/config.ini to a safe place (if you have made changes to it)
copy cluster/config/my.cnf to a safe place (if you have made changes to comment out #skip_thread_priority in my.cnf if you have it set (it is deprecated).
generate a new config that mimics your current config so that you get exactly the same scripts (but a new version of them).
Unpack the mysqlcluster-63.tar.gz
copy the config.ini and the my.cnf from the safe place to mysqlcluster-63cluster/config/
cd mysqlcluster-63/cluster/scripts
sh download-and-compile.sh
sh dist.sh
sh rolling-restart.sh (will copy out the config files from cluster/config hence it was important to copy them back there)
CMON If you are using cmon-0.10 (you should upgrade to cmon-0.11) or cmon-0.11:
stop cmon
start a mysql client:
use cmon
drop mysql_statistics;
drop mysql_variables;
start cmon
cmon will now recreate the mysql_statistics and mysql_variables tables. This is because the MySQL server in MySQL Cluster 6.3.19 (and 6.3.18) has added more status variables. If you don't do the above, then cmon cannot aggregate mysql statistics.
I have the pleasure to invite the Ukrainian and Russian MySQL communities to three events in the upcoming weeks, all of which I am attending myself.
Firstly, there is the Sun Microsystems ISV partnership event in Kiev on Wed 26.11.2008 from 10:00-18:00. The program for this event is published on my Russian language blog. Please register for this event through an email to Svetlana.Kovtun@Sun.COM.
Second, right after the ISV partnership event, we have a MySQL community get-together in the Sun offices within a 15 minutes walk. We’ll talk about MySQL and Sun, the integration status and plans, at 18:30-20:00. I hope to get several of the former MySQL employees, current Sun employees, attending. As a community member, you can attend either one or both of these events, but always register with Svetlana.Kovtun@Sun.COM.
Third, we have a community event for the Moscow MySQL users. It happens on Monday 1 December 2008 from 19:00 onwards in the High School of Economics at Myasnitskaya 20, room 124.
I have measured the time it takes to do some operations with Cluster (6.3.19, which is about to be released very soon - you should upgrade) with respect to the disk, like starting it, creating table spaces etc.
Below all of this you can find the setup I used.
Initial start of Cluster (40GB of redo log files): 6.3.19: 3min 27sec
Create a 20GB undo file: 6.3.19: 6min 17sec
Create a 128MB data file for the tablespace: 6.3.19: ~3 sec
Insert 1M records a' 4096B (1 thread, batches of five): 6.3.19: 286 sec (3721.12 QPS) (we can probably provision faster with bigger batches or more threads)
I then provisioned another 4M records (total of 5M records in DB).
Evil test: 100K Random reads (read 4096B) (5M records in DB): 6.3.19: 1290.42QPS (20 threads, io util is ~90% so we are almost completely io bound). So this is result is inline what we would expect when being io bound, especially since i have used four data nodes, each having one disk.
Setup:
Total of 4 data nodes on 4 computers.
Another computer was used to generate the inserts/reads
8GB of RAM
Gig-E
1 * 146GB SAS, 10KRPM
128MB IndexMemory
1024MB DataMemory
ODirect=1
SharedGlobalMemory=384M
NoOfFragmentLogFiles=40
FragmentLogFileSize=256M
DiskPageBufferMemory=3072M
Table space (one ts) with 100 datafiles a' 128MB (best practice is to use many but small data files instead of one big. This will be changed in 6.4 so that you can use one big data file).
Extent size=1MB (which is quite ok)
Logfile group: One 20GB undofile and 128MB Undobuffer The undo file was a bit too big (not that it matter, i had the disk space) but I used 5366054928 extents out of 21474836480 (so ~25% was only used).
There is also a new configuration option in 6.3.19 which lets you create the data files
The disk data table looks like (data column will be stored on disk): create table dd ( id integer primary key, ts integer, data varbinary(4096), index(ts)) engine=ndb TABLESPACE ts_1 storage disk;
Now, especially for inserts/writes there are quite a few things competing for the single disk:
REDO log
LCP
and the disk data itself (UNDO LOG + DATA FILES)
However, the best setup is to (if you have two disks)
Disk 1: REDO + LCP
Disk 2: UNDO LOG + DATA FILES
Three disks:
Disk 1: REDO + LCP
Disk 2: UNDO LOG
Disk 3: DATA FILES
If you have more than that then you can put DATA FILES and UNDO LOG on a RAID.
IMPORTANT: When you have done an --initial start, the files for the UNDO LOG and the DATA FILES are NOT removed. You have to remove them by hand. Otherwise you will get an error if you try to CREATE LOGFILE GROUP..
So it’s been a while since I’ve hit my blog, but I feel compelled to respond to Baron’s post, and many of the other (perhaps short sighted) criticisms out there against this new functionality that we’ve been working on for so long.
Everybody seems to be saying that this functionality should be implemented in the server, or that the better way to do this is to use these patches which add functionality to the logging that MySQL already provides. Well guess what people - what does that give you, other than some more details on you queries?
More I/O.
What’s bad on a database server?
More I/O.
Query Analyzer, whilst it does currently use a proxy to collect the statistics, doesn’t hit your disk at all. Everything is collected and aggregated in memory, it doesn’t require you to turn on logging at all to collect this information. Everybody already knows that turning on the logging on MySQL can give some “serious” performance decreases - which is why we initially looked at using the proxy, rather than log monitoring and aggregation.
Yes, we are still looking at feeding information in to the Query Analysis page from different sources, such as the logs, or even DTrace, for people that don’t want to use the proxy, but we think we’ve actually come up with the better solution for overall database performance here in the short term.
And guess what:
It works on 4.1, 5.0, 5.1, 6.0
It gets the EXPLAIN plan at the time the query was at it’s worst, not at some random time afterwards when you decide to run a log analysis
You can redirect away from the proxy when you don’t need it - no extra overhead at all (whilst server instrumentation is more code in the server, more overhead, whether you want it or not…)
Now, we do have some limitations, which are perfectly valid - we can’t monitor the queries that a slave SQL thread executes for example, and we can’t measure the IO, lock waits, etc. for each statement - but we have some plans on server instrumentation without I/O in the works for that, and Query Analysis is sure to snap that up when available.
I’ll also note that the user and table statistics that were created by Google “get this”, and I’d very much like to see those integrated in to the server. We could really make some good use of those within MySQL Enterprise Monitor as well, and I’m hopeful that we will get these integrated at some point.
So Percona - here’s a challenge - how about creating your extra query logging patches in a way that doesn’t give the I/O overhead of using logging, in the way that Google’s User/Table/Index monitoring does? That would make them compelling, for me. The statistics that have been implemented are fantastic, no doubt about that, but don’t add I/O or lock overhead (such as the current mysql.general_log and mysql.slow_log in 5.1).
Yesterday, I got an email and some fresh pictures from a happy Andrii Nikitin, Ivan’s father. He met with a Danish doctor last week, and it seems as if everything is ready for Ivan and the rest of the family to move from the German clinic to Denmark at the end of November. The Nikitin family is settling down in Denmark and Andrii is already employed by Sun Microsystems Denmark.
Andrii, the proud father, says Ivan looks like any boy now. Sure, Ivan still gets tired a bit faster than other boys his age, and sometimes, he has a small nose bleed. However, the Nikitin family is now able to live as most families do.
The generous donations by the MySQL community and Sun employees were enough to cover the expert treatment Ivan got in Germany. It wouldn’t have covered a transplantation, but it looks increasingly unlikely that one will be needed. The Nikitins and all MySQL guys at Sun are very, very grateful for the donations.
So to sum it up: Ivan Nikitin’s future is painted in much brighter colours than ever before!
I know the architect of the new Sun/MySQL query analyser, Mark Matthews, very well. Even though the tool is not OSS (and thus does not benefit from an OSS development model with early feedback in the design/development cycle, from many eyes) I know that Mark organises his designs and team in such a way that the resulting quality (barring interference) will always be very good. Anyone who uses Connector/J will surely attest to that. So I'd personally trust the result, based on my knowledge of the person who led the team. That's still a form of semi-blind trust though, OSS development makes that unnecessary which is why I'd say an OSS dev model is preferred for most things. And the new tool is, of course, bundled with the MySQL Enterprise subscription model. You have to subscribe to that service to have access to the tool.
But back to the query analyser itself. I'm really fairly pleased with Sun/MySQL launching this, because it does appear to provide actual value for clients, without weird arrangements that make people cringe. Well done, Mark & MySQL!
Now we can make an interesting observation... as Baron already noted, there are some things that only the server knows. No proxy, packet sniffing, processlist or even show status can reveal all details in the required context. And running things like SHOW SESSION STATUS just adds extra overhead (an extra round trip to the server) after each query, so while that would provide many of the otherwise missing insights, it's not really a desirable approach. This is why I too regard extra instrumentation inside the server as vital. Yet:
- Sun/MySQL, owner of the MySQL codebase (which is GPL), makes tools that are independent from the server. - Others (Percona, OurDelta) add extra instrumentation into the server, to enable better monitoring and tuning.
MySQL just released their new MySQL Query Analyzer (link to a trial), and recently wrote up an interview with Mark Matthews about it. If you haven’t read that article, go ahead and do it. I have not used this software, but I fully believe its functionality is quite nice.
But there is at least [...]
Currently I'm working hard to find and remove scalability bottlenecks in the MySQL Server. MySQL was acquired by Sun 10 months ago by now. Many people have in blogs wondered what the impact has been from this acquisition. My personal experience is that I now have a chance to work with Sun experts in DBMS performance. As usual it takes time when working on new challenges before the flow of inspiration starts flowing. However I've seen this flow of inspiration starting to come now, so the fruit of our joint work is starting to bear fruit. I now have a much better understanding of MySQL Server performance than I used to have. I know fairly well where the bottlenecks are and I've started looking into how they can be resolved.
Another interesting thing with Sun is the innovations they have done in a number of areas. One such area is DTrace. This is a really interesting tool which I already used to analyse some behaviour of MySQL Cluster internals with some success. However to analyse other storage engines inside MySQL requires a bit more work on inserting DTrace probes at appropriate places.
To work with DTrace obviously means that you need to work with an OS that supports DTrace. Solaris is such a one, I actually developed NDB Cluster (the storage engine for MySQL Cluster) on Solaris the first 5-6 years. So one would expect Solaris to be familiar to me, but working with Linux mainly for 6-7 years means that most of the Solaris memory is gone.
So how go about developing on Solaris. I decided to install a virtual machine on my desktop. As a well-behaved Sun citizen I decided to opt for VirtualBox in my choice of VM. This was an interesting challenge, very similar to my previous experiences on installing a virtual machine. It's easy to get the VM up and running, but how do you communicate with it. I found some instructions on how to set-up IP links to a virtual machine but to make life harder I have a fixed IP address on my desktop so this complicated life quite a bit. Finally I learned a lot about how to set-up virtual IP links which I already have managed to forget about :)
The next step is to get going on having a development environment for opensolaris. I soon discovered that there was a package manager in opensolaris which could be used to get all the needed packages. However after downloading a number of packages I stumbled into some serious issues. I learned from this experience that usage of Developer Previews for OS's is even worse than newly released OS's which I already know by experience isn't for the fainthearted.
So I decided to install a released opensolaris version instead (the OpenSolaris2008.05 version). After some googling I discovered a very helpful presentation at opensolaris developer how-to which explained a lot about how to install a development environment for opensolaris.
After installing opensolaris 2008.05, after following the instructions on how to install a development environment I am now equipped to develop DTrace probes and scripts and try them out on my desktop.
I definitely like the idea that opensolaris is looking more like yet another Linux distribution since it makes it a lot simpler to work with it. I would prefer GNU developer tools to be there from scratch but I have the same issue with Ubuntu.
That the system calls are different don't bother me as a programmer since different API's to similar things is something every programmer encounters if he's developing for a multi-platform environment. I even look forward to trying out a lot of Solaris system calls since there are lots of cool features on locking to CPU's, controlling CPU's for interrupts, resource groups, scheduling algorithms and so forth. I recently noted that most of these things are available on Linux as well. However I am still missing the programming API's to these features.
Lately planetmysql.com has shown an increasing amount of commercial promotion for Sun products. Many recent articles included in this supposedly MySQL-related feed aggregator have nothing to do with MySQL.
Now I see an article included in the feed from ComputerWorld, about database researcher David Dewitt getting a job at Microsoft. Not only is this non-MySQL-related content, but more than
Imagine eating meze in Ankara with a Turkish Member of the Parliament, and member of the Turkish EU Harmonisation Committee. That’s not the basic reason I joined MySQL AB back in 2001, but it’s one of the most interesting outcomes of my decision. Bottom line: The honourable Mr Osman Co
I am hoping I can get some feedback from you all. I have been excited about the support for completion in the NetBeans PHP editor (and other editors to follow). The response to my quick little blog about this has been pretty positive.
To refresh, here's an example of what it would look like:
But I have also gotten some feedback from more than one source that this may not actually be as valuable as it may first appear. The reason is that, and this makes sense, the actual work flow when building SQL for your application is as follows:
Open SQL tool and write your query
Execute, evaluate results, modify, repeat
Copy and paste final SQL into your editor
With this flow in mind, it doesn't seem that important to provide completion and other SQL editing features embedded in your application editor, because you need to test to validate that the SQL is correct.
What are your thoughts? Can you explain to me why SQL editing features in the code editor is valuable, if generally SQL is composed in a SQL tool/editor?
One thing we've talked about is enabling testing from within the code editor. But it's not clear what this would look like. We would need to be able to build up an executable SQL statement from SQL that contains substitution variables and may even be a concatenation of multiple strings. I suppose it's possible, but it seems complicated and error-prone.
I suppose one simplistic approach is the following flow:
In your editor, write a full SQL statement as a string
right-click and choose "Test SQL" (or run a hot key) to let you test the SQL string right there in the application editor
refine, retest
modify the SQL to use substitution variables
At least with this approach you wouldn't have to cut-and-paste, but I'm not sure it buys you much.
So, what are your thoughts? Your feedback much appreciated.
The MySQL database server can display a lot of performance statistics. However, most of them relate to general server parameters, such as buffer sizes and query cache settings. Of course, such settings are important to get right, and can make a huge difference in performance. Once you've tuned them however, you need to start looking at the tables and queries. When a single bad query can cost 90 - 99% of the total performance, server tuning doesn't fix the problem. But obtaining query profiling information from MySQL can be tricky. Here are some of the options:
Slow query log
You can use the built-in slow_query_log from within the server. It will show you queries that take more than a second. Such queries typically don't use indices properly and are important to address. But only slow queries end up there, and many performance problems aren't caused by slow queries, but by queries that are run very often (for example, for-loops in the code instead of JOINs). You could use complete server query logging. This will log every statement to file, but it's not recommended, as it will take too much disk I/O.
SHOW PROFILES
SHOW PROFILES command by Jeremy Cole (available from MySQL 5.0.37) can help you a great deal. Once enabled, it will gather cpu execution times and other important information on up to 100 queries. You can compare the results and see wich query is most expensive. This article explains more.
SHOW USER / INDEX / TABLE STATISTICS
The SHOW USER / INDEX / TABLE STATISTICS patches from Google add functionality to trace cpu time and much more per user, index and table. No query information is shown though, you will have to figure that out based on the tables. The patches are not part of the main MySQL distribution, but Percona provides prebuilt MySQL versions.
MySQL Enterprise Monitor's Query Analyzer
MySQL Enterprise Monitor's Query Analyzer is great for finding the heaviest queries. Using MySQL Proxy (man-in-the-middle software), it collects all queries sent to the server and ranks them by execution time. You can also see number of times the query was run, and number of rows typically returned. Being the best tool of them all, it also costs, and it requires additional software/hardware setup for the MySQL Proxy.
SHOW PROCESSLIST
Another option is the SHOW PROCESSLIST command. It shows a list of currently running processes (queries), slightly similar to the Unix ps command or Windows Task Manager. For every query, you can see how long the query has run, its user, ip and state. If you run this command just once, you will get a snapshot of what queries the server is busy doing right now. Aggregating multiple snapshots over time can give you a good view of what queries the server typically is busy performing. An advantage of SHOW PROCESSLIST is that it works on all MySQL versions without modifications and requires no additional software / hardware. On the downside, you don't necessarily catch all queries, just the queries that happen to run during every snapshot. If you collect many snapshots over long time, this is less of a problem.
Issues
You might not have the option to go for MySQL Enterprise, or install MySQL Proxy in front of your database servers. Or you might not be able to swith to the MySQL server versions containing Google's patches or SHOW PROFILES. Another problem is, that even if you do get profiling information, it is typically presented in raw text form, and might not be easy to browse through. Good tools to visualize the profiling information and navigate through the data is just as important as getting the information in the first place.
A new profiling tool
We're currently in the process of developing a tool based on the SHOW PROCESSLIST above. The tool, called Jet Profiler for MySQL, collects process list information and stores it in an internal database where it is analyzed, ranked and presented. After normalizing the queries, top lists are created of:
most frequent queries
most frequent users
most frequent tables
most frequent schemas
By looking at the state of the processes, additional information can be extracted, such as lock-prone queries (if you're using MyISAM tables), or queries which create a lot of temp tables on disk. The tool is a client desktop app so no server changes are required. It works on all MySQL versions (3.23 - 6.0).
Better visualization and usability
To provide good visualization, the information is presented in a line chart over time. You can zoom in on spikes and other interesting time intervals, and see the top queries / users / etc for that particular time frame. Every top list is accompanied by a pie chart to make it easier to compare the impact of different items. An experimental EXPLAIN visualization feature is underway, helping you understand the queries better.
Beta testers are welcome
We still have lots of adjustments and features before a public release, but we are looking beta testers. If you are interested, click here.
of course the real aim should be to scale with one instance on the machine as scaling with multiple instances on the one machine isn’t scaling at all - it’s scale out, but with more problems (now when one machine goes down, so do 1110202434 database instances).
There will be a Moscow MySQL User Group meeting with Kaj in High School of Economics:
Date: Monday, 1st of December Time: 19-00 Topic: Sun and MySQL: what's happening Location: Myasnitskaya 20, room 124
Everybody is welcome to join. To come, you need to register (there is security at the entrance to the building, and it needs a list of names, usual stuff), there is no url for a registration form yet ;), you could just leave a comment on this entry, I'll add you to the list.
Memory leaks (missed a couple of mysql_free_resultset) https://bugs.launchpad.net/cmon/+bug/300225
There are also some code cleanups. To upgrade from earlier version:
start a mysql client on the mysql server(s) having the cmon database
use cmon
drop mysql_statistics table
drop mysql_variables table
start cmon which will then recreate those tables
The above is particularly important if you move from MySQL Cluster 6.3.17 or earlier to 6.3.18 (or later) because the number of statistic counters and variables has increased between the versions. cmon will build up the mysql_statistics and mysql_variables tables dynamically instead (if you have dropped them earlier).
Admins in the U.S got a chance to talk shop at EDUCAUSE and the LISA ‘08 summit, but what about those in Europe? Don’t despair, because we’re co-hosting a mini-conference with MySQL, SchoolForge, RedHat, Sun, Op5, and Fusis at our UK headquarters in London on November 27th.
While the presentations are specifically aimed at education as well as the non-profit sector, anyone is welcome to attend. Engage us in discussions about your thoughts on the latest technology, how it blends with your school’s ICT strategy, or tackle open source trends for the future. Details on the schedule are over at OpenSourceInSchools.org.uk (of course registration is free, and besides providing the specifics it also gets you a complimentary lunch).
MySQL Performance Tuning is always a popular topic for DBAs and Developers. Traditionally, database tuning has required manual effort and custom scripts. MySQL Query Analyzer aims to minimize the manual effort, with easy-to-use tools to help find-n-fix problem queries. In this interview, we talk with Mark Matthews, to get an architects view of the product.
I wanted to call out one database tooling feature in NetBeans 6.5 that I think is pretty cool, and it's a little bit hidden.
If you execute a query and you get results, you can then do a number of interesting things with them.
Things like show what the CREATE TABLE statement would look like, or copy and paste the data you have selected (the paste format is hard-coded right now, but next release we plan to support CSV and XML output).
The one I like the best is that you can generate INSERT statements for the data.
Combine this with generating the CREATE TABLE statement, and it makes it really easy to hand your data set to someone else for testing or collaboration.
One of the problems with living in times as they are now is the lack of funds for what I like to call non-directed autodidactic exploration. Some of you would call this 'investigating new tools' but most would call it getting 'toys'. How do you know if D-trace, the MySQL Query Analyzer, or the next new announcement is that new tool that you can not live without if you do not get a chance to live with it? In the past I was able to rig an old laptop, my new laptop and an old development system to play, er, experiment with MySQL Cluster before taking my Cluster DBA exam.
Cisco CCIE candidates end up with an expensive pile of routers, hubs, switches, and cables in preperation for their lab exam. I wanted to get back up to speed on Solaris and went hunting used hardware for Sparc system so I could learn all the new items added since the 2.5.1/2.6 days (and boy did they add stuff). In the back of my mind is using some holiday time to work on a Solaris 10 Certification.
I also wanted hardware to let me investigate OpenSolaris, Glassfish, and the latest BSD-based releases. Plus there is that experiment I want to do with the Proxy Server and sharding. The list of wants was a lot longer than the funds on hand. And the idea of adding a pyramid of box past their prime did not appeal to me nor did the prospect of a higher electrical bill.
Virtual Box is a virtualization software that is freely available under a GPL license. Very quickly I was able to install Virtual box on my Mac and configure Ubuntu and OpenSolaris clients.
Nothing is really free. Each client takes a dozen or so GB of disk space. Running both clients, NeoOffice and Thunderbird at the same time makes my MacBook work hard. The good news is that is easy to remove the clients so that old opportunity cost is relatively low.
Now I have Glassfish and OpenSolaris on one virtual machine and can use my Mac and the virtual Ubuntu as clients. I can then query a MySQL 5.1 instance on the Ubuntu system and watch the Proxy Server re-route the request. Plus I was able to load the latest Zend Framework without having to disturb an older version on my main development system. Total cost out of pocket $0. Disk space cost about 25 GB. Time spent? Well that took about an hour for the first virtual server and twenty minutes for the second.
But now I have systems that I can use to explore all those new tools without having to crack open the wallet. And I have all these new toys, er, tools to learn. Times are good.
Eat your own dog food. The latest development version of MySQL Workbench successfully runs a pre-alpha snapshot of the MySQL Driver for C++ since a few weeks. Enjoy your pizza at my costs, Andrey (Hristov). I lost my bet. Less than five bugs have been found when migrating MySQL Workbench to Connector/C++.
MySQL Connector/C++ is now being used by two "internal customers": MySQL Workbench and MySQL Connector/OpenOffice.org. And our internal development version of Connector/OpenOffice.org runs on Connector/C++ as well. The preview version of Connector/OpenOffice.org was using the MySQL Client Library (C-API)
Yesterday we announced what the Waffle Grid project is. Some people ignored it, others I am sure read it with curiosity, but i figure their is a small subset of folks out their who said: “WOW!!! That is really cool. I want a Waffle Grid now!”… I wonder if these are the same people who run out and buy the latest SSD harddrive because its fast and the latest nvidia card because they can one up all their friends… To those who want it now, but do not know where to begin I let me give you a quick install tutorial. Some of this already exists on the wiki, but hopefully I will make it clearer here.
After some discussions on the OpenSQLCamp 2008 conference we decided to move our development to Launchpad, to be in stream with other MySQL related projects.
I got some really awesome feedback from Giuseppe (the Data Charmer) in my preivous post and it gave me a few ideas I thought I would share. One of the really nice things he did in his article about partitioning was to test partitions with the ARCHIVE engine. His results were mixed (and, actually, so were mine), but I did want to give it a spin. Here is what I ended up with:
DROP TABLE IF EXISTS Logger;
CREATE TABLE `Logger` (
`timestampOccurred` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dateOccurred` date NOT NULL,
`session` char(32) DEFAULT NULL,
`host` varchar(255) DEFAULT NULL,
`sslMode` enum('enabled','disabled') DEFAULT 'enabled',
`requestURI` varchar(255) DEFAULT NULL,
`referer` varchar(255) DEFAULT NULL,
`userAgent` varchar(255) DEFAULT NULL,
`remoteHost` int(10) unsigned NOT NULL
) ENGINE=ARCHIVE DEFAULT CHARSET=utf8
PARTITION BY RANGE (TO_DAYS(dateOccurred)) (
PARTITION p0801 VALUES LESS THAN (TO_DAYS('2008-02-01')),
PARTITION p0802 VALUES LESS THAN (TO_DAYS('2008-03-01')),
PARTITION p0803 VALUES LESS THAN (TO_DAYS('2008-04-01')),
PARTITION p0804 VALUES LESS THAN (TO_DAYS('2008-05-01')),
PARTITION p0805 VALUES LESS THAN (TO_DAYS('2008-06-01')),
PARTITION p0806 VALUES LESS THAN (TO_DAYS('2008-07-01')),
PARTITION p0807 VALUES LESS THAN (TO_DAYS('2008-08-01')),
PARTITION p0808 VALUES LESS THAN (TO_DAYS('2008-09-01')),
PARTITION p0809 VALUES LESS THAN (TO_DAYS('2008-10-01')),
PARTITION p0810 VALUES LESS THAN (TO_DAYS('2008-11-01')),
PARTITION p0811 VALUES LESS THAN (TO_DAYS('2008-12-01')),
PARTITION p0812 VALUES LESS THAN (TO_DAYS('2009-01-01'))
);
So, as before, I am still partitioning by year and month using the TO_DAYS() function. I also borrowed from Giuseppe's work and used the TO_DAYS function when defining each PARTITION, so that at least the original CREATE TABLE statement is a bit more clear (although running a SHOW CREATE TABLE against this monster still results in a somewhat confusing mess). To help clean things up a bit further, instead of simply naming the partitions p0, p1, and so on, I opted to include the year and month. It makes it slightly easier to read.
The question is, did this make a difference? Before moving it over to an Archive table, the Logger table was a MyISAM table with indexes on date, and session:
This size of this table was about 206MB (171MB of data and 36M of indexes). After switching over to the ARCHIVE engine (while making a few changes to the table-structure), the result was around 26MB on disk. That's a pretty huge difference, and I'm even storing a bit more data. I added a date field in addition to the timestamp field because I was unable to get partitioning to work correctly when partitioning on a timestamp. I'm actually hoping I can one day just drop the date field and use timestamps with partitions at some point - seems like it should work, though I won't claim to be an expert in this regard.
Performance-wise, it looks like things the speed of pulling my reports is about the same. Actually it might be a bit slower than before, but not by much. Of course, my data-set is quite small - high traffic sites may see very different results. The performance, however, is obviously much better than using an ARCHIVE engine without partitions. I haven't run any tests this time around, but when I was using the ARCHIVE engine before without partitions, pulling a smaller set of data took 2-3x longer than it now does.
Now obviously this only works when one is able to use partitions. The data I pull is usually over a small say of days and so can get lumped in a few partitions. Doing aggregate results on the whole data-set is still going to be quite slow. For me, I don't plan on doing those sorts of things very often and so the end result, while not as cool as I had hoped, does appear to be an improvement. Certainly the fact that I am using noticeably less disk space is pretty neat. And, at least in my tests, ARCHIVE has phenomenal insert performance.
Big thanks goes to Giuseppe for giving me some nice ideas, and for is excellent article on partitioning in 5.1. He's planning on posting additional information on his blog, so I recommend keeping an eye out for it!
MySQL has long built a great database. It's increasingly also building a great database business.
MySQL made two big announcements on Wednesday, one product-related (the final release of MySQL 5.1) and the second licensing model-related (improvements to the subscription-only MySQL Enterprise Monitor service with Query Analyzer). Of the two, I believe the latter is the more important as it helps Sun to monetize the research and development investments it has been making in the MySQL product.
The product announcement is that MySQL 5.1 will ship on or before December 6. Sun announced MySQL 5.1 back in April, but now it's ready for release. MySQL 5.1 is not important because it adds transactions capabilities (MySQL has had this functionality for years), but rather because it augments MySQL's sweet spot: industry-beating scalability and performance.
Better than Oracle? Absolutely, as Sun senior vice president of Database Products, Marten Mickos, told me in a phone interview:
Performance and total cost of ownership are the two areas where we beat Oracle. Having said that, we aren't adding new functionality in order to compete with Oracle, but rather to serve our existing market and new markets.
Which markets? As the enterprise moves applications to the Web, that's MyS