Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pivot a MySQL entity-attribute-value schema

The question mentions MySQL, and in fact this DBMS has a special function for this kind of problem: GROUP_CONCAT(expr). Take a look in the MySQL reference manual on group-by-functions. The function was added in MySQL version 4.1. You'll be using GROUP BY FileID in the query.

I'm not really sure about how you want the result to look. If you want every attribute listed for every item (even if not set), it will be harder. However, this is my suggestion for how to do it:

SELECT bt.FileID, Title, Author, 
 GROUP_CONCAT(
  CONCAT_WS(':', at.AttributeName, at.AttributeType, avt.AttributeValue) 
  ORDER BY at.AttributeName SEPARATOR ', ') 
FROM BaseTable bt JOIN AttributeValueTable avt ON avt.FileID=bt.FileID 
 JOIN AttributeTable at ON avt.AttributeId=at.AttributeId 
GROUP BY bt.FileID;

This gives you all attributes in the same order, which could be useful. The output will be like the following:

'F001', 'Dox', 'vinay', 'CustomAttr1:varchar(40):Akash, CustomUseDate:Datetime:2009/03/02'

This way you only need one single DB query, and the output is easy to parse. If you want to store the attributes as real Datetime etc. in the DB, you'd need to use dynamic SQL, but I'd stay clear from that and store the values in varchars.


The general form of such a query would be

SELECT file.*,
   attr1.value AS 'Attribute 1 Name', 
   attr2.value AS 'Attribute 2 Name', 
   ...
FROM
   file 
   LEFT JOIN attr AS attr1 
      ON(file.FileId=attr1.FileId and attr1.AttributeId=1)
   LEFT JOIN attr AS attr2 
      ON(file.FileId=attr2.FileId and attr2.AttributeId=2)
   ...

So you need to dynamically build your query from the attributes you need. In php-ish pseudocode

$cols="file";
$joins="";

$rows=$db->GetAll("select * from Attributes");
foreach($rows as $idx=>$row)
{
   $alias="attr{$idx}";
   $cols.=", {$alias}.value as '".mysql_escape_string($row['AttributeName'])."'";   
   $joins.="LEFT JOIN attr as {$alias} on ".
       "(file.FileId={$alias}.FileId and ".
       "{$alias}.AttributeId={$row['AttributeId']}) ";
}

 $pivotsql="select $cols from file $joins";

If you're looking for something more usable (and joinable) than a group-concat result, try this solution below. I've created some tables very similar to your example to make this make sense.

This works when:

  • You want a pure SQL solution (no code, no loops)
  • You have a predictable set of attributes (e.g. not dynamic)
  • You are OK updating the query when new attribute types need to be added
  • You would prefer a result that can be JOINed to, UNIONed, or nested as a subselect

Table A (Files)

FileID, Title, Author, CreatedOn

Table B (Attributes)

AttrID, AttrName, AttrType [not sure how you use type...]

Table C (Files_Attributes)

FileID, AttrID, AttrValue

A traditional query would pull many redundant rows:

SELECT * FROM 
Files F 
LEFT JOIN Files_Attributes FA USING (FileID)
LEFT JOIN Attributes A USING (AttributeID);
AttrID  FileID  Title           Author  CreatedOn   AttrValue   AttrName    AttrType
50      1       TestFile        Joe     2011-01-01  true        ReadOnly        bool
60      1       TestFile        Joe     2011-01-01  xls         FileFormat      text
70      1       TestFile        Joe     2011-01-01  false       Private         bool
80      1       TestFile        Joe     2011-01-01  2011-10-03  LastModified    date
60      2       LongNovel       Mary    2011-02-01  json        FileFormat      text
80      2       LongNovel       Mary    2011-02-01  2011-10-04  LastModified    date
70      2       LongNovel       Mary    2011-02-01  true        Private         bool
50      2       LongNovel       Mary    2011-02-01  true        ReadOnly        bool
50      3       ShortStory      Susan   2011-03-01  false       ReadOnly        bool
60      3       ShortStory      Susan   2011-03-01  ascii       FileFormat      text
70      3       ShortStory      Susan   2011-03-01  false       Private         bool
80      3       ShortStory      Susan   2011-03-01  2011-10-01  LastModified    date
50      4       ProfitLoss      Bill    2011-04-01  false       ReadOnly        bool
70      4       ProfitLoss      Bill    2011-04-01  true        Private         bool
80      4       ProfitLoss      Bill    2011-04-01  2011-10-02  LastModified    date
60      4       ProfitLoss      Bill    2011-04-01  text        FileFormat      text
50      5       MonthlyBudget   George  2011-05-01  false       ReadOnly        bool
60      5       MonthlyBudget   George  2011-05-01  binary      FileFormat      text
70      5       MonthlyBudget   George  2011-05-01  false       Private         bool
80      5       MonthlyBudget   George  2011-05-01  2011-10-20  LastModified    date

This coalescing query (approach using MAX) can merge the rows:

SELECT
F.*,
MAX( IF(A.AttrName = 'ReadOnly', FA.AttrValue, NULL) ) as 'ReadOnly',
MAX( IF(A.AttrName = 'FileFormat', FA.AttrValue, NULL) ) as 'FileFormat',
MAX( IF(A.AttrName = 'Private', FA.AttrValue, NULL) ) as 'Private',
MAX( IF(A.AttrName = 'LastModified', FA.AttrValue, NULL) ) as 'LastModified'
FROM 
Files F 
LEFT JOIN Files_Attributes FA USING (FileID)
LEFT JOIN Attributes A USING (AttributeID)
GROUP BY
F.FileID;
FileID  Title           Author  CreatedOn   ReadOnly    FileFormat  Private LastModified
1       TestFile        Joe     2011-01-01  true        xls         false   2011-10-03
2       LongNovel       Mary    2011-02-01  true        json        true    2011-10-04
3       ShortStory      Susan   2011-03-01  false       ascii       false   2011-10-01
4       ProfitLoss      Bill    2011-04-01  false       text        true    2011-10-02
5       MonthlyBudget   George  2011-05-01  false       binary      false   2011-10-20

This is the standard "rows to columns" problem in SQL.

It is most easily done outside SQL.

In your application, do the following:

  1. Define a simple class to contain the file, the system attributes, and a Collection of user attributes. A list is a good choice for this collection of customer attributes. Let's call this class FileDescription.

  2. Execute a simple join between the file and all of the customer attributes for the file.

  3. Write a loop to assemble FileDescriptions from the query result.

    • Fetch the first row, create a FileDescription and set the first customer attribute.

    • While there are more rows to fetch:

      • Fetch a row
      • If this row's file name does not match the FileDescription we're building: finish building a FileDescription; append this to a result Collection of File Descriptions; create a fresh, empty FileDescription with the given name and first customer attribute.
      • If this row's file name matches the FileDescription we're building: append another customer attribute to the current FileDescription

I have been experimenting with the different answers and Methai's answer was the most convenient for me. My current project, although it does uses Doctrine with MySQL, has quite a few loose tables.

The following is the result of my experience with Methai's solution:

create entity table

DROP TABLE IF EXISTS entity;
CREATE TABLE entity (
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    title VARCHAR(255),
    author VARCHAR(255),
    createdOn DATETIME NOT NULL
) Engine = InnoDB;

create attribute table

DROP TABLE IF EXISTS attribute;
CREATE TABLE attribute (
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL,
    type VARCHAR(255) NOT NULL
) Engine = InnoDB;

create attributevalue table

DROP TABLE IF EXISTS attributevalue;
CREATE TABLE attributevalue (
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    value VARCHAR(255) NOT NULL,
    attribute_id INT UNSIGNED NOT NULL,
    FOREIGN KEY(attribute_id) REFERENCES attribute(id)
 ) Engine = InnoDB;

create entity_attributevalue join table

DROP TABLE IF EXISTS entity_attributevalue;
CREATE TABLE entity_attributevalue (
    entity_id INT UNSIGNED NOT NULL,
    attributevalue_id INT UNSIGNED NOT NULL,
    FOREIGN KEY(entity_id) REFERENCES entity(id),
    FOREIGN KEY(attributevalue_id) REFERENCES attributevalue(id)
) Engine = InnoDB;

populate entity table

INSERT INTO entity
    (title, author, createdOn)
VALUES
    ('TestFile', 'Joe', '2011-01-01'),
    ('LongNovel', 'Mary', '2011-02-01'),
    ('ShortStory', 'Susan', '2011-03-01'),
    ('ProfitLoss', 'Bill', '2011-04-01'),
    ('MonthlyBudget', 'George', '2011-05-01'),
    ('Paper', 'Jane', '2012-04-01'),
    ('Essay', 'John', '2012-03-01'),
    ('Article', 'Dan', '2012-12-01');

populate attribute table

INSERT INTO attribute
    (name, type)
VALUES
    ('ReadOnly', 'bool'),
    ('FileFormat', 'text'),
    ('Private', 'bool'),
    ('LastModified', 'date');

populate attributevalue table

INSERT INTO attributevalue 
    (value, attribute_id)
VALUES
    ('true', '1'),
    ('xls', '2'),
    ('false', '3'),
    ('2011-10-03', '4'),
    ('true', '1'),
    ('json', '2'),
    ('true', '3'),
    ('2011-10-04', '4'),
    ('false', '1'),
    ('ascii', '2'),
    ('false', '3'),
    ('2011-10-01', '4'),
    ('false', '1'),
    ('text', '2'),
    ('true', '3'),
    ('2011-10-02', '4'),
    ('false', '1'),
    ('binary', '2'),
    ('false', '3'),
    ('2011-10-20', '4'),
    ('doc', '2'),
    ('false', '3'),
    ('2011-10-20', '4'),
    ('rtf', '2'),
    ('2011-10-20', '4');

populate entity_attributevalue table

INSERT INTO entity_attributevalue 
    (entity_id, attributevalue_id)
VALUES
    ('1', '1'),
    ('1', '2'),
    ('1', '3'),
    ('1', '4'),
    ('2', '5'),
    ('2', '6'),
    ('2', '7'),
    ('2', '8'),
    ('3', '9'),
    ('3', '10'),
    ('3', '11'),
    ('3', '12'),
    ('4', '13'),
    ('4', '14'),
    ('4', '15'),
    ('4', '16'),
    ('5', '17'),
    ('5', '18'),
    ('5', '19'),
    ('5', '20'),
    ('6', '21'),
    ('6', '22'),
    ('6', '23'),
    ('7', '24'),
    ('7', '25');

Showing all the records

SELECT * 
FROM `entity` e
LEFT JOIN `entity_attributevalue` ea ON ea.entity_id = e.id
LEFT JOIN `attributevalue` av ON ea.attributevalue_id = av.id
LEFT JOIN `attribute` a ON av.attribute_id = a.id;
id  title           author  createdOn           entity_id   attributevalue_id   id      value       attribute_id    id      name            type
1   TestFile        Joe     2011-01-01 00:00:00 1           1                   1       true        1               1       ReadOnly        bool
1   TestFile        Joe     2011-01-01 00:00:00 1           2                   2       xls         2               2       FileFormat      text
1   TestFile        Joe     2011-01-01 00:00:00 1           3                   3       false       3               3       Private         bool
1   TestFile        Joe     2011-01-01 00:00:00 1           4                   4       2011-10-03  4               4       LastModified    date
2   LongNovel       Mary    2011-02-01 00:00:00 2           5                   5       true        1               1       ReadOnly        bool
2   LongNovel       Mary    2011-02-01 00:00:00 2           6                   6       json        2               2       FileFormat      text
2   LongNovel       Mary    2011-02-01 00:00:00 2           7                   7       true        3               3       Private         bool
2   LongNovel       Mary    2011-02-01 00:00:00 2           8                   8       2011-10-04  4               4       LastModified    date
3   ShortStory      Susan   2011-03-01 00:00:00 3           9                   9       false       1               1       ReadOnly        bool
3   ShortStory      Susan   2011-03-01 00:00:00 3           10                  10      ascii       2               2       FileFormat      text
3   ShortStory      Susan   2011-03-01 00:00:00 3           11                  11      false       3               3       Private         bool
3   ShortStory      Susan   2011-03-01 00:00:00 3           12                  12      2011-10-01  4               4       LastModified    date
4   ProfitLoss      Bill    2011-04-01 00:00:00 4           13                  13      false       1               1       ReadOnly        bool
4   ProfitLoss      Bill    2011-04-01 00:00:00 4           14                  14      text        2               2       FileFormat      text
4   ProfitLoss      Bill    2011-04-01 00:00:00 4           15                  15      true        3               3       Private         bool
4   ProfitLoss      Bill    2011-04-01 00:00:00 4           16                  16      2011-10-02  4               4       LastModified    date
5   MonthlyBudget   George  2011-05-01 00:00:00 5           17                  17      false       1               1       ReadOnly        bool
5   MonthlyBudget   George  2011-05-01 00:00:00 5           18                  18      binary      2               2       FileFormat      text
5   MonthlyBudget   George  2011-05-01 00:00:00 5           19                  19      false       3               3       Private         bool
5   MonthlyBudget   George  2011-05-01 00:00:00 5           20                  20      2011-10-20  4               4       LastModified    date
6   Paper           Jane    2012-04-01 00:00:00 6           21                  21      binary      2               2       FileFormat      text
6   Paper           Jane    2012-04-01 00:00:00 6           22                  22      false       3               3       Private         bool
6   Paper           Jane    2012-04-01 00:00:00 6           23                  23      2011-10-20  4               4       LastModified    date
7   Essay           John    2012-03-01 00:00:00 7           24                  24      binary      2               2       FileFormat      text
7   Essay           John    2012-03-01 00:00:00 7           25                  25      2011-10-20  4               4       LastModified    date
8   Article         Dan     2012-12-01 00:00:00 NULL        NULL                NULL    NULL        NULL            NULL    NULL            NULL

pivot table

SELECT e.*,
    MAX( IF(a.name = 'ReadOnly', av.value, NULL) ) as 'ReadOnly',
    MAX( IF(a.name = 'FileFormat', av.value, NULL) ) as 'FileFormat',
    MAX( IF(a.name = 'Private', av.value, NULL) ) as 'Private',
    MAX( IF(a.name = 'LastModified', av.value, NULL) ) as 'LastModified'
FROM `entity` e
LEFT JOIN `entity_attributevalue` ea ON ea.entity_id = e.id
LEFT JOIN `attributevalue` av ON ea.attributevalue_id = av.id
LEFT JOIN `attribute` a ON av.attribute_id = a.id
GROUP BY e.id;
id  title           author  createdOn           ReadOnly    FileFormat  Private LastModified
1   TestFile        Joe     2011-01-01 00:00:00 true        xls         false   2011-10-03
2   LongNovel       Mary    2011-02-01 00:00:00 true        json        true    2011-10-04
3   ShortStory      Susan   2011-03-01 00:00:00 false       ascii       false   2011-10-01
4   ProfitLoss      Bill    2011-04-01 00:00:00 false       text        true    2011-10-02
5   MonthlyBudget   George  2011-05-01 00:00:00 false       binary      false   2011-10-20
6   Paper           Jane    2012-04-01 00:00:00 NULL        binary      false   2011-10-20
7   Essay           John    2012-03-01 00:00:00 NULL        binary      NULL    2011-10-20
8   Article         Dan     2012-12-01 00:00:00 NULL        NULL        NULL    NULL