Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement deserialize #138

Open
wants to merge 18 commits into
base: pz-serialize
Choose a base branch
from
Open

Implement deserialize #138

wants to merge 18 commits into from

Conversation

peterzhu2118
Copy link
Member

Based on #137.

This PR implements Liquid::Template.load which reads and deserializes a parse liquid template.

I added a SerializeParseContext class that is a subclass of ParseContext that is used to store the information required for deserialization. The actual deserialization work is done in the function block_body_parse_from_serialize that is called during Liquid::BlockBody.parse when a SerializeParseContext is passed in.

require 'test_helper'

class TemplateTest < MiniTest::Test
def test_serialize
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to (or need to) further improve test coverage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to stress test it would be if we could run the tests with a monkey patch that automatically serializes and deserializes on Liquid::Template.parse. Even if we didn't always run CI with that, it would help us find missing tests by adding a test for any failure that it turns up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we add this to the benchmarks, and run those as part of CI it would add coverage for free.

We also need to make sure Liquid::Template.load(serialize).render! is, and will remain faster than Liquid::Template.parse(source).render!.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented benchmarks in Shopify/liquid#1385. Deserialize is about 20% faster than parsing from source.

@tobi
Copy link
Member

tobi commented Dec 17, 2020

Great work Peter.

@peterzhu2118
Copy link
Member Author

Thank you Tobi! ♥

Copy link
Contributor

@macournoyer macournoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! That was a big piece. Nice work 🙇

The only potential issue I see is w/ the assert(RTEST(tag_class));

ext/liquid_c/block.c Outdated Show resolved Hide resolved
VALUE markup = rb_utf8_str_new(tag_markup_header_markup(current_tag), current_tag->markup_len);

VALUE tag_class = rb_funcall(tag_registry, intern_square_brackets, 1, tag_name);
assert(RTEST(tag_class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should raise? Seems technically possible that tag could get removed between the time the template is parsed, cached (for a long time) & deserialized.

Maybe using Block.unknown_tag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good point. On a unrelated note, should block.c be renamed to block_body.c since it actually contains the code for Liquid::C::BlockBody?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, block_body.c would be a more appropriate name for this file.

@peterzhu2118 peterzhu2118 force-pushed the pz-deserialize branch 3 times, most recently from 0b504b7 to 07ffbfb Compare January 7, 2021 16:37
ext/liquid_c/serialize_parse_context.c Outdated Show resolved Hide resolved
require 'test_helper'

class TemplateTest < MiniTest::Test
def test_serialize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to stress test it would be if we could run the tests with a monkey patch that automatically serializes and deserializes on Liquid::Template.parse. Even if we didn't always run CI with that, it would help us find missing tests by adding a test for any failure that it turns up.

dump_load_eval('{% if test %}goodbye {% else %}hello {% endif %}world', 'test' => false))
assert_equal('hello world', dump_load_eval('{% if true %}hello {% endif %}{% if true %}world{% endif %}'))
assert_equal('123', dump_load_eval('{% for i in (1..10) %}{{i}}{% if i == 3 %}{% break %}{% endif %}{% endfor %}'))
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a unit test that ensures that line numbers are being set properly. I think Liquid::ParseContext#line_number is being inherited by Liquid::C::SerializeParseContext without it ever being set on deserialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is e3f403c the kind of test you were thinking of?

ext/liquid_c/template.c Outdated Show resolved Hide resolved
ext/liquid_c/template.c Outdated Show resolved Hide resolved
const char *data = RSTRING_PTR(source);

document_body_header_t *header = (document_body_header_t *)data;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least have a serialization format version number in the header to check against in order be able to make backwards incompatible changes. We will need to actually raise an exception for that check to safely handle the incompatible version error by loading and parsing the template source.

Another check we could use would be an endianness check. Little-endian seems fairly standard these days, so if we don't want to extend the header and add a runtime check, we could have a compile-time check for it in extconf.rb. We can do the same if we add any more platform specific dependencies.

We should also document that this method needs to be passed data serialized by this library to be used safely, since we probably don't want to add significant overhead to provide runtime safety.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 005b088. I added a Liquid::C::DeserializationError class that is raised for all serialization incompatibility reasons.

ext/liquid_c/serialize_parse_context.c Outdated Show resolved Hide resolved
ext/liquid_c/block.c Outdated Show resolved Hide resolved
ext/liquid_c/tag_markup.c Outdated Show resolved Hide resolved
@peterzhu2118
Copy link
Member Author

@dylanahsmith I added a rake task that runs the liquid integration tests for serialization in a65f10e and it's now running as part of CI. The commits after that commit are fixes I made from bugs discovered when running the integration tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants