doc-comment 0.4: proc-macro time

Before starting, here is a small reminder: the doc-comment crate provides macros to help you write and test documentation.

Until now, the crate was using declarative Rust macros. However, even though I appreciated its simplicity a lot, it had some clear limitations that couldn't be overcome. For example, you couldn't use the doc_comment! macro to document the fields of a type. Another big problem was that you can't use macros to generate inner attributes:

Runmacro_rules! foo {
    () => {{
        #![doc = "tadam"]
    }};
}

In this case, the compiler will simply refuse to compile because an inner attribute is misplaced. For all these reasons, I decided to fully switch to proc-macros for the version 0.4. With this change, the minimum supported Rust version becomes 1.38 (I'll explain the technical reasons below).

Changes

Let's start with a small comparison. Before you would have written:

Rundoc_comment! {
    concat!("documentation", " is amazing!", include_str!("some_file.md")),
    struct Foo {
        field: i32, // we can't document it!
    }
}

Now it's written:

Run#[doc_comment("documentation ", "is amazing!", include_str!("some_file.md"))]
struct Foo {
    #[doc_comment("we now can document it!")]
    field: i32,
}

I find the new syntax much better and easier to follow. For the doctest macro, it's still the same:

Rundoctest!("file_to_test.md");
doctest!("file_to_test.md", test_name);

Pretty nice, right? For everyone interested into the implementation issues/challenges, more to come in the next section! For the others, you can just jump to the conclusion. :)

Implementation

An attribute-like proc-macro is a function looking like this:

Run#[proc_macro_attribute]
pub fn doc_comment(attrs: TokenStream, item: TokenStream) -> TokenStream {
    // ...
}

The first argument contains the parameters inside the derive attribute (so with the example above it is: ["documentation ", "is amazing!", include_str!("some_file.md")]) and the second contains the item on which the attribute is used (so struct Foo { #[doc_comment("...")] field: i32, }). The output is the "transformed" input (so if you just return item, your proc-macro attribute will "disappear" and won't impact the code). All the challenge is to update the TokenStreams without breaking everything. :)

In case you don't know, when using proc-macros in your crate, Rust compiles in this order:

proc-macros
macros
Rust code

And this was my first big problem and the reason why the minimum supported version by this crate is 1.38. In case include_str! is used, since it's in a proc-macro, it hasn't been "interpreted" by the compiler because we're still in the proc-macro. So I had to "interpret" it myself and include the file. Getting the file content isn't complicated in itself, you just get the current file path (using the file macro), take its parent and then join with the path given in the include_str macro.

So what's the problem in this case, right? Well, in case you update this file, Rust has no way to know it needs to update the crate because it doesn't "track" this file's last modification time. How to make it aware of it then? Well, by using include_str of course! (Yep, trick time!)

So whenever the proc-macro encounters the include_str macro, it gets the file's content and also generates the following code:

Runconst _: &'static str = include_str!("file_to_test.md");

And it is because of this very specific line that the minimum supported version is now 1.38. To be more precise: anonymous constants are the reason. Before that Rust 1.38 you can't declare anonymous constants, making it really complicated to be able to generate X potential constants without having duplicates (it would force to have globals in the doc-comment crate to track them for example, and it could still potentially conflict with one constant from the crate using doc-comment).

Parsing items

To be able to document fields, you can't expect to have your attribute proc-macro to get called on the field directly, it doesn't work this way, it can only be called on items (types, impls, functions, etc). Therefore, if you want to be able to document a field, you actually need to parse the whole second argument (item).

Before going further, let's talk about TokenStream. It's basically an array over an enum (TokenTree) which represents the Rust parser items, but simplified. You can have Punct, Ident, Literal and Group. You can see more precisely each variant here. Anyway, the big advantage is that you can convert it to strings by simply calling .to_string() on it and convert it back to TokenStream by calling .parse().unwrap().

So back to parsing item: in itself, it's not very complicated, you just need to be careful not to break the syntax itself when reading the TokenStream if you play with strings (which I do). The thing is that you need to detect an attribute (a Punct('#') followed by a Group surrounded by bracket delimiters). And of course, you need to recursively go down every time you encounter a Group (it can be a sub-item).

The big issue with this is that you're forced to pick a given name for your attribute when applied to a field. For example, this won't work:

Runuse doc_comment::doc_comment as dc;

#[dc("hello")] // It works.
struct Foo {
    #[dc("field")] // It doesn't work!
    field: i32,
}

My parser cannot know that you renamed it to dc (however, if there is a way to know, please tell me!), limiting it to doc_comment:

Runuse doc_comment::doc_comment as dc;

#[dc("hello")] // It works.
struct Foo {
    #[doc_comment("field")] // It works...
    field: i32,
}

Inner attributes

What originally motivated me to convert doc-comment to proc-macros was the hope to finally be able to generate inner attributes. Well, in short, it's not possible. Funny thing though:

Run#[proc_macro_attribute]
pub fn doc_comment(attrs: TokenStream, item: TokenStream) -> TokenStream {
    item
}

If you just return item without any modifications of any kind (meaning you don't include attrs), then the compiler is fine with it (but your proc-macro is perfectly useless):

Run#![doc_comment("inner attribute but invisible! :D")]

There is an issue open on the rust repository which tracks this feature. Let's hope we'll be able to see it soon!

Syn, quote, ...

It's pretty common for crates providing proc-macros to use these crates and some others to make the developer's life easier. Since my crate is quite small, I didn't see much advantage after trying them out so I decided to just not use them for the moment. Might be worth it in the future though, even more for the compilation errors (which are nightly only in the proc-macro API!).

To extend a bit on the compiler errors: each token from the TokenStream type has a Span which represents where they are located in the source files. With a Span, you can send error which will use it to have nice errors. The problem is that most of these methods are nightly-only. For example Span::error. You can use proc-macro2 to be able to use them on stable, but I didn't find doc-comment's errors bad enough to use it. But again, I'll very likely do so in the future.

Conclusion

This conversion to proc-macro at least allowed me to discover them from an implementation point of view. I was very disappointed on a lot of points, but I like the new syntax better for my doc_comment (proc-)macro. At least now people can use it to document fields, which is more than enough improvement to justify it.

Posted on the 01/07/2020 at 13:30 by @GuillaumeGomez

geos 7.0 release: More type safety, update dependencies and use std TryFrom

New process-viewer release: processes disk usage

Back to articles list