Is it YAML?

Enter some text in the text field below and click "Submit" to check if it is YAML. Hit "Reset" to revert back to 'Hello World'.

Put yer YAML here!


About

"Is it YAML?" is a Django application for checking whether text is YAML ("a human friendly data serialization standard for all programming languages"), or not. Users type and/or copy and paste text and clicks the "Submit" button above. If valid YAML has been entered, the input is presented in canonical form. Invalid input causes the application to present an error message.

YAML is just a text format for exchanging data. It exists for cases where XML is too much overhead. I like YAML. I find it more robust in practice than XML (where one missing angle bracket could corrupt a whole file), yet easier to type by hand.

I conjured up "Is it YAML?" while writing a specification for YPath - a language for addressing parts of a YAML document, as XPath does for XML. To do a proper job of things, I had to write YAML example files. However, I sometimes needed to check if the example files are good and proper. "Is it YAML?" gave me a test bed where I can copy-and-paste my examples, and see if they really are YAML.

Installation and Dependencies

Apart from Django, the app depends on PyYAML, a Python parser for YAML. You can get the application from PyPI through the command:

pip install isityaml

Once installed, just add "isityaml" to your INSTALLED_APPS list in settings.py, and add the desired URL in one of the urls.py files.

The installation comes with a HTML template file designed to work with Mezzanine, a Django CMS. You are looking at the result. Feel free to customise: the app is released under a 3 clause BSD license. If you wish to do any changes, pop over to the GitHub repository for the app.

History

  • 0.1 (August 30th 2011) - Create setup script for files.
  • 0.2 (April 25th 2013) - Try to make a half-decent PyPI package.
  • 0.3 (January 30th 2014) - Updated to be compatible with Django 1.6 and Mezzanine 3.0.
  • 0.4 (February 15th 2014) - Added more error handling and styling to be compatible with Bootstrap.
  • 0.5 (August 13th 2014) - Cleaned up error handling and installation issues.
  • 0.6 (February 28th 2017) - Updated to be compatible with Django 1.10.
  • 0.7 (December 6th 2020) - Run through 2to3 for Python 3 changes.

Security

"Is It YAML?" runs on a public facing website. Any internet user can view it, benign or maelevolent. So an obvious question is: could this application be hacked? The thought crossed my mind after encountering the presentation "Serialization formats aren't toys", which was delivered by Tom Eastman at PyCon Australia. The talk covered security issues in XML, JSON and YAML. The presentation can be viewed online at the conference's YouTube channel. Let's look at the attack vector mentioned in the video, and see if if could appear here.

"Is It YAML?" uses PyYAML as the underlying parser. YAML uses tags to indicate the types of data one may encounter in the document. The parser supports tags of the format "!!python/object/apply:module.function" values for this tag should be expressed as a sequence of strings. If YAML data is constructed as Python objects (note the use of the word "constructed") the parser looks for function in the Python module, passes those values as arguments, and calls the function on the computer. As Tom Eastman pointed out in the talk, PyYAML will actually go to the trouble of loading the modules if they aren't already loaded. This could be a problem with untrusted input, and since this app is on the Interwebs, all input is untrusted by definition. The first example used in the talk appears designed to get a directory listing.

"contents_of_cwd": !!python/object/apply:subprocess.check_output ['ls']

He then followed it up with:

"goodbye": !!python/object/apply:os.system ["rm *"]

Could "Is It YAML?" be used to destroy a website hosting it? After having a good look at my code, I can state with confidence: not by this attack vector. The app will parse the input, but so far as to make sure it is parsable as YAML: sequences that begin with opening square brackets end with closing square brackets, strings that begin with quotes end with them, and so on. It goes so far as to compose a node graph of its contents (which is serialized back into canonical form for display), but doesn't go the extra step and construct Python objects with data types matching those specified in the input YAML. So the app happily parses the last YAML fragment to create the following, and continues along its merry way:

---
!!map {
  ? !!str "goodbye"
  : !!python/object/apply:os.system [
    !!str "rm *",
  ],
}

JSON is a subset of YAML, and Tom Eastman brought up an attack vector for JSON. It seems some people are using eval() to parse untrusted JavaScript input. That's ... dangerous. Fortunately, since the latest version of PyYAML doesn't use eval(), this danger does not apply here.

If you wish to publicly host the "Is It YAML?", you have to ask yourself the question: is the app safe? I believe it is, or at least as safe as the underlying Django code that supports it. It takes textual input, treats it as data, attempts to parse it, and catches exceptions when data fails to parse. It never treats the input as executable code. That's what I reckon, and I am standing by it by hosting the app myself on my site.

Copyright © Peter Murphy 2011–2024.

Recent Tweets