EAV is one of those love/hate relationships in Magento. EAV provides great flexibility but with potentially large performance implications. Knowing this is a sensitive topic does not however diminish my curiosity. I know there will be some passionate opinions out there in the community. But I also know there is a lot of expertise in the community which I also enjoy learning from. So I hereby don my flame retardant suit and dive in.
Disclaimer: This post is my own and is no indication or commitment from eBay. Seriously. There are no plans to change anything at present in Magento 2. Seriously. But its useful for me to learn from the community for when such discussions come up.
How It Works
Very briefly, EAV is where for each additional attribute instead of adding another column to a table, you create a new row in an EAV table where you have a key, attribute name, and attribute value. This means when doing SQL queries you have to do lots of outer joins to get the full record back, but it does make it easier to add new attributes to a table without database schema changes.
Why Use It
One example in Magento is a merchant can define their own attributes for products and store them. No need to do database schema changes. Extensions can also use it without introducing schema changes.
How Could it be Done Differently in Magento 2?
Well, actually this is where I was hoping to learn from the community. Have you tried implementing any other approaches and have practical experience of what worked well or badly? I have talked to people and read some blog posts. This post is to cast a wider net to see what I have missed.
Some examples I know of
- Add table columns, at least for common EAV attributes. (I believe it is faster to do column adds now in MySQL than in the past.)
- Use MongoDB or similar which has a richer data module – its easy to add additional attributes then. But it is more open heart surgery on your Magento 1 installation to achieve this, and transactional integrity is harder to guarantee with some data in MySQL and some in MongoDB.
- Instead of an EAV table, use a CLOB field to store say JSON encoded name/value pairs in the main table. Makes it harder to do SQL queries or use the results, but no need for join queries.
I was curious about other options and real world experiences. For example, how often do people query on EAV attributes? Do the number and size become large? Does a JSON field end up worse that separate tables with lots of join queries? And so on.
I know there will probably be some passionate opinions expressed on this one. That is fine. Please play fair, my flame retardant suit is on, but fire away with your comments! 😉
Thanks Alan for bringing that to the table. In Magento 1 we have seen that the EAV concept isn’t always an advantage so the developers introduced sales_flat_order|quote + grid tables and catalog_category|product_flat tables. I think that could also make sense for customer and customer_address tables. However the EAV definitions might still be useful to save options and translations as well as form definitions (see customer frontend/backend forms in EE). Although this could even be replaced by some configuration (core_config_data) or XML structure.
But what would definitely make sense is to melt down database complexity (-> catalog index tables) as today there are tools available to perform DDL statements without interruption of DML statements. Also Magento should make even more use of advanced MySQL features like triggers, (materialized) views and procedures.
However the goal should be to reach better performance for large databases. Catalogs with more than 100k products, 1 million customers and 5k orders per day shouldn’t be a challenge for a modern eCommerce platform. I guess that can be achieved with MySQL only but the database design should also allow entity separation, e.g. transaction, stock and catalog information in different databases/schemas/servers. That would also open the possibility to move catalog information to a different storage like MongoDB, to scale the database layer and/or create and deploy seasonal/versioned catalogs (EE’s content staging still offers room for some improvements ;-)).
Well, as with many other aspects of Magento (both 1 and 2) I don’t think Magento should/needs to come up with a special solution since the requirements to a persistence layer aren’t that different to other platforms out there and should start being a better “Open Source citizen” instead.
While I don’t think EAV is necessarily bad I feel the current implementation is over-engineered and – of course – slow. The fact that flat catalogs were introduced makes the implementation even more complex.
Here are two “solutions” to come into my mind:
1.) Keep it (super) simple: No EAV. Products live in a table with the most important attributes are columns. Store/website specific “overlays” could live in separate records in the same table or in a parallel one. Whoever needs to add anything to the product model simply takes care of it however he/she thinks works best, preferably by adding a different table and referencing it to the product id (composition over extension).
For attributes that are being generated dynamically this solution could rely on a simplified EAV model or go adding columns to the existing table or a new table referenced to the product.
Of course you might loose the generic ability to include any attribute to a collection, but isn’t this what makes everything super slow? And is this really required?
Haven’t thought this through and this simplified concept might run into serious limitations.
2.) “Not Magento’s Job”: Start using Doctrine and let them handle the data persistence. Using a clean repository pattern and let Doctrine do the abstraction seems to be a good solution that works for many other frameworks and platforms and as far as I know Doctrine has a concept in place that allows dynamic attributes (that internally are handled “EAV”-like).
Akaneo seems to have very similar requirements for flexibility and the seem to have solved this in a smart way: http://docs.akeneo.com/master/cookbook/product/query-products.html
This is quite a hot topic 🙂
At the very beginning of announcing M2, i remember the point to drop EAV…
Just a few thinks here.
One main point on EAV with Magento is multi websites/stores/views and multiple possible values for an attribute depending on the scope.
In my opinion, EAV should be dropped because of its complexity (for developers) and poor performance.
Flat tables could be used and tables fields could be dynamically added/removed.
Some kind of attributes could have their value stored as json because there is no need to use them in where/sort statements.
Hope this helps the reflection.
Regarding “use a CLOB field to store say JSON encoded name/value pairs in the main table. Makes it harder to do SQL queries or use the results, but no need for join queries.”, one site that used this approach was FriendFeed. They detailed a little bit about their indexing approach.
I noticed MySQL has XPath support as well, so I assume could encode attributes as XML in a BLOB field. http://dev.mysql.com/doc/refman/5.7/en/xml-functions.html Anyone tried this before?
I didn’t know that MySQL could do that, but
1.) I would be surprised if that’s acceptable fast and scales with a large catalog (both many products or large xml structure).
2.) I don’t think relying on special MySQL features is a good thing (this also relates to Bjoern’s comment of using more advanced MySQL feature like triggers), since this will move too much business logic into the database layer and will bind you to MySQL (and to storing data in relation databases in general) a lot more than it should. Although Magento2 gave up support for other database systems I don’t think it’s a smart idea to block this for future iterations by making architectural design decisions like this one. IMHO using triggers for Magento 1.13’s indexes already wasn’t a good idea.
MySQL now has JSON column types from MySQL 5.7. For me I think I favour this approach for applications where you may not be at liberty to go with a NOSQL database and you do not want to deal with the complexity of an EAV
I think this will have same disadvantage that fabriziobranca mentioned: we will be binded to MySQL in this case
We are looking into separating DB access better so people can slot in different DB tech without biz logic changes.