Today, Magento 2 uses relative file paths for XSD references. Apart from being ugly, this imposes directory structure restrictions. For example, if we want to put Composer packages under ‘vendor’ in the future, the relative paths would break. So we are looking towards using absolute paths for the XSD references. The question is what is the best way to achieve this.
A common approach is to use a real HTTP based URL that can be resolved. That way tools can go fetch the XSD referenced by the URL. There are however negatives
- You need to host this at a URL that will never change. Doable, but it does impose restrictions while developing XSDs. (Possible example, http://magento.com/xsd/2.0.0/layout.xsd. Hoping GitHub never goes away, you could do a GitHub based URL instead.)
- It can cause problems while an XSD is under development. If the XSD is in your Git repo, you can version it with your system (which is particularly useful during the development phase).
- There is a risk that your installation may be downloading the file when you don’t expect it. We ship the file with Magento, ideally tools should fetch it from the local file system for speed. I have encountered (non-Magento) sites where production was being slowed down due to mis-configuration and the XSD file was being downloaded repetitively.
- It can represent a denial of service attack on whoever hosts the file. Remember there are a lot of Magento sites out there. All of them hitting the central XSD file can add up if it is not cached properly.
There are several solutions.
- Use a URN. We come up with our own scheme and have a path that looks something like Vendor/Module/file.xsd (maybe plus a version number).
- Use a URL, but don’t actually make it resolvable. That is, it looks like a URL but if you type it into a web browser nothing will be downloaded.
- Don’t include a path (just use “layout.xsd”) and have tools map that to the real file path. (This is the old idea of a “SYSTEM” path for DTDs.)
Having been bitten by production issues in the past with the XSD schema references hitting the internet by personal bias is away from using a HTTP based identifier, but I will admit I may just be jaded. Having the path also be resolved by your local installation does help while XSD’s are not yet stable. But what is your opinion? Please vote or leave a comment if you have practical experience you think is relevant.
Oh, and those who want to read up on what PHP Storm supports, try https://www.jetbrains.com/idea/help/referencing-dtd-or-schema.html. It would be nice if we could ship with Magento the mapping file from URIs to local file paths to reduce the developer setup effort. Like a CATALOG file from the good old days of SGML.
Iam for yes, they should point to a real url. So if someone finds this xml file and wonders what standard it follows, they can look it up.
But, that is only true for a stable standard, before it reaches this state, it should point to a non resolvable URL, so it still has a unique ID and we can configure our dev tools to use a local file for this. Something like http://magento.com/xsd/1.0-beta/layout.xsd.
And inside of magento it should never get downloaded and always use a local file. I dont have much experience with this, but I think a validator should always use a local file and never need to download something “on the fly”.
The part which I don’t understand yet is why a production environment would actively validate itself on an ongoing basis. If this is currently the case I would consider this a bug that warrants further investigation.
To me the XSDs describe a file format that helps during development. The only other time I see the validation should be used is when installing a new extension into your Magento environment, but again this should ideally not happen directly in the production environment (this might also be offloaded to the new Connect – ie everything installed from Connect has passed validation).
I am with Flyingmama on this – once this is stabilised host it online under magento.com (shortly before developer RC). The beauty of a properly versioned format/url is that you can have a cache expiry of decades as the url content will not change (or should for that matter).
You are correct – it SHOULD NOT need to validate in production mode. If using a urn, then I can guarantee it. But I don’t have a history with php. This is why I was asking for experiences of the community. (We have our own internal opinions too of course.) if it does not happen in php land then my concern there goes away. I just need to make sure I don’t mount a DoS attack against our selves.
Alan
To those who prefer to identify XML schemas using URL: How do you choose to refer to a PHP class? By its name (fully qualified) or by a location (e.g. a URL) at which a file containing the class can currently be located by you, but maybe not by others?
Presumably you prefer to do the former, with some separate means of locating a class based on its name, e.g. an autoloader.
The problems of locating XML schemas and PHP classes based on their names aren’t identical, but I think they are similar.
I don’t see why an XML schema should be identified by a location which could change. I also really dislike when schemas use a non-existent URL as a namespace, because it gives the appearance that something is broken.
Can’t we use URNs for namespaces and the schemaLocation attribute to tell developers and parsers where to locate schemas?