Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I access MySQL InnoDB index values directly without the MySQL client?

I've got an index on columns a VARCHAR(255), b INT in an InnoDB table. Given two a,b pairs, can I use the MySQL index to determine if the pairs are the same from a c program (i.e. without using a strcmp and numerical comparison)?

  1. Where is a MySQL InnoDB index stored in the file system?
  2. Can it be read and used from a separate program? What is the format?
  3. How can I use an index to determine if two keys are the same?

Note: An answer to this question should either a) provide a method for accessing a MySQL index in order to accomplish this task or b) explain why the MySQL index cannot practically be accessed/used in this way. A platform-specific answer is fine, and I'm on Red Hat 5.8.


Below is the previous version of this question, which provides more context but seems to distract from the actual question. I understand that there are other ways to accomplish this example within MySQL, and I provide two. This is not a question about optimization, but rather of factoring out a piece of complexity that exists across many different dynamically generated queries.

I could accomplish my query using a subselect with a subgrouping, e.g.

SELECT c, AVG(max_val)
FROM (
    SELECT c, MAX(val) AS max_val
    FROM table
    GROUP BY a, b) AS t
GROUP BY c

But I've written a UDF that allows me to do it with a single select, e.g.

SELECT b, MY_UDF(a, b, val)
FROM table
GROUP by c

The key here is that I pass the fields a and b to the UDF, and I manually manage a,b subgroups in each group. Column a is a varchar, so this involves a call to strncmp to check for matches, but it's reasonably fast.

However, I have an index my_key (a ASC, b ASC). Instead of checking for matches on a and b manually, can I just access and use the MySQL index? That is, can I get the index value in my_key for a given row or a,b pair in c (inside the UDF)? And if so, would the index value be guaranteed to be unique for any value a,b?

I would like to call MY_UDF(a, b, val) and then look up the mysql index value (a,b) in c from the UDF.

like image 453
jmilloy Avatar asked Nov 09 '12 23:11

jmilloy


2 Answers

Look back at your original query

SELECT c, AVG(max_val)
FROM
(
    SELECT c, MAX(val) AS max_val
    FROM table
    GROUP BY a, b
) AS t
GROUP BY c;

You should first make sure the subselect gives you what you want by running

SELECT c, MAX(val) AS max_val
FROM table
GROUP BY a, b;

If the result of the subselect is correct, then run your full query. If that result is correct, then you should do the following:

ALTER TABLE `table` ADD INDEX abc_ndx (a,b,c,val);

This will speed up the query by getting all needed data from the index only. The source table never needs to be consulted.

Writing a UDF is and calling it a single SELECT is just masquerading a subselect and creating more overhead than the query needs. Simply placing your full query (one nested pass over the data) in the Stored Procedure will be more effective that getting most of the data in the UDF and executing single row selects iteratively ( something like O(n log n) running time with possible longer Sending data states).

UPDATE 2012-11-27 13:46 EDT

You can access the index without touching the table by doing two things

  • Create a decent Covering Index

    ALTER TABLE table ADD INDEX abc_ndx (a,b,c,val);

  • Run the SELECT query I mentioned before

Since the all the columns of the query all in the index, the Query Optimizer will only touch the index (or precache index pages). If the table is MyISAM, you can ...

  1. setup the MyISAM table to have a dedicated key cache that can be preloaded on mysqld startup
  2. run SELECT a,b,c,val FROM table; to load index pages into MyISAM's default keycache

Trust me, you really do not want to access index pages against mysqld's will. What do I mean by that?

For MyISAM, the index pages for a MyISAM table are stored in the .MYI file of the table. Each DML statement will summon a full table lock.

For InnoDB, the index pages are loaded into the InnoDB Buffer Pool. Consequently, the associated data pages will load into the InnoDB Buffer Pool as well.

You should not have to circumvent access to index pages using Python, Perl, PHP, C++, or Java because of the constant I/O needed by MyISAM or the constant MVCC protocols being exercised by InnoDB.

There is a NoSQL paradigm (called HandlerSocket) that would permit low-level access to MySQL tables that can cleanly bypass mysqld's normal access patterns. I would not recommend it since there was a bug in it when using it to issue writes.

UPDATE 2012-11-30 12:11 EDT

From your last comment

I'm using InnoDB, and I can see how the MVCC model complicates things. However, apparently InnoDB stores only one version (the most recent) in the index. The access pattern for the relevant tables is write-once, read-many, so if the index could be accessed, it could provide a single, reliable datum for each key.

When it comes to InnoDB, MVCC is not complicating anything. It can actually become your best friend provided:

  • if you have autocommit enabled (It should be enabled by default)
  • the access pattern for the relevant tables is write-once, read-many

I would expect the accessed index pages to be sitting in the InnoDB Buffer Pool virtually forever if it is read repeatedly. I would just make sure your innodb_buffer_pool_size is set high enough to hold necessary InnoDB data.

like image 156
RolandoMySQLDBA Avatar answered Oct 31 '22 05:10

RolandoMySQLDBA


If you just want to access an index outside of MySQL, you will have to use the API for one of the MySQL storage engines. The default engine is InnoDB. See overview here: InnoDB Internals. This describes (at a very high level) both the data layout on disk and the APIs to access it. A more detailed description is here: Embedded InnoDB.

However, rather than write your own program that uses InnoDB APIs directly (which is a lot of work), you might use one of the projects that have already done that work:

  • HandlerSocket: gives NoSQL access to InnoDB tables, runs in a UDF. See a very informative blog post from the developer. The goal of HandlerSocket is to provide a NoSQL interface exposed as a network daemon, but you could use the same technique (and much of the same code) to provide something that would be used by a query withing MySQL.

  • memcached InnoDB plugin. gives memcached style access to InnoDB tables.

  • HailDB: gives NoSQL access to InnoDB tables, runs on top of Embedded InnoDB. see conference presentation. EDIT: HailDB probably won't work running side-by-side with MySQL.

I believe any of these can run side-by-side with MySQL (using the same tables live), and can be used from C, so they do meet your requirements.

If you can use/migrate to MySQL Cluster, see also NDB API, a direct API, and ndbmemcache, a way to access MySQL Cluster using memcache API.

This is hard to answer without knowing why you are trying to do this, because the implications of different approaches are very different.

like image 29
Alex I Avatar answered Oct 31 '22 03:10

Alex I