rustdoc and the re-exports

rustdoc is the tool to generate documentation in rust. In this blog post we will see why handling re-exports with rustdoc is so complicated and how it's done.

So first, let's see what a re-export is by taking an example.

What are re-exports?

Let's say we are writing a library (named lib) with some types dispatched in sub-modules:

Runpub mod sub_module1 {
    pub struct Foo;
}
pub mod sub_module2 {
    pub struct AnotherFoo;
}

Users can import them like this:

Runuse lib::sub_module1::Foo;
use lib::sub_module2::AnotherFoo;

But what if you want the types to be available directly at the crate root or if we don't want the modules to be visible for users? That's where re-exports come in:

Run// `sub_module1` and `sub_module2` are not visible outside.
mod sub_module1 {
    pub struct Foo;
}
mod sub_module2 {
    pub struct AnotherFoo;
}

// We re-export both types:
pub use crate::sub_module1::Foo;
pub use crate::sub_module2::AnotherFoo;

And now users will be able to do:

Runuse lib::{Foo, AnotherFoo};

And since both sub_module1 and sub_module2 are private, users won't be able to import them.

Now let's see why rustdoc is having such a hard time handling them.

How rustdoc works

Let's first explain (very) quickly how rustdoc works: rustc provides an API to get access to its internal data, allowing "custom drivers" (like clippy and rustdoc) to do what they do, although at different levels. It requires a bit of setup first (I'm writing a crate to help with this part, you can check it out here) but once done, you have access to rustc internals.

Now, we can actually go through all the items rustc knows of and start working on them. To do so we use a Visitor. You just need to implement the methods you're interested into and then they will be called directly by the visitor.

Once we have all the information we want, we process everything in order to render into the expected format (either HTML or JSON).

Now back to the re-exports.

rustdoc and the re-exports

There are many different problems with re-exports, so let's go through some of them.

To inline or not to inline?

First, let's explain what "inlining" means in this context. And as often, let's show it with code and images:

pub mod public {
    pub struct Foo;
    pub struct Bar;
}

#[doc(inline)]
pub use crate::public::Bar;
pub use crate::public::Foo;

And here what the documentation for this code looks like:

So we can see a module, a re-export and a struct. The struct is public::Bar which was inlined because of #[doc(inline)]. It means that even though it's a re-export, we see as if it was declared directly into the crate root.

Apart from #[doc(inline)], a re-export will be inlined if:

The item is private.
The item is in a non-public module.
The item is in a hidden module (with #[doc(hidden)]).

Important to be noted: if an item has #[doc(hidden)], then the only way to make it appear in a re-export is to add #[doc(inline)] on the corresponding re-export.

To sum all this in code:

Runpub mod public {
    pub struct Bar;
    #[doc(hidden)]
    pub struct PubHidden;
}

#[doc(hidden)]
pub mod hidden {
    pub struct Hidden;
}

mod private {
    pub struct Private;
}

#[doc(inline)] // Needed because `public` is, well, public.
pub use crate::public::Bar;
#[doc(inline)] // Needed because `PubHidden` has `#[doc(hidden)]`.
pub use crate::public::PubHidden;
pub use crate::hidden::Hidden;
pub use crate::private::Private;

Re-exports of re-exports

In rustc internals, a re-export is represented by an enum variant containing:

A path: x::y::z for example.
A kind: to simplify, it's either a glob import (x::*) or a single import (x::y).

Now what's interesting is that the path contains the "resolution" of the import, allowing us to get access to the item being imported/re-exported. And here comes the first big issue. Let's consider this code:

Runmod a {
    /// Some docs.
    pub struct Foo;
}
mod b {
    /// first
    pub use crate::a::Foo;
}
/// second
pub use crate::b::Foo;

In here, the only visible item in the documentation is crate::Foo, which is re-exported by crate::b::Foo. The generated documentation of the crate will display:

So the documentation from all re-exports and from the original item was added as expected. To see the issue with this code, we need to go a bit deeper: the resolution of crate::b::Foo points directly to crate::a::Foo. Great no? There is just one tiny little issue: crate::b::Foo re-export has documentation and it needs to be displayed. So now, how do we get all re-exports documentation between the last re-export and the item?

rustc internals don't provide an easy solution for that, so we needed to go around this limitation. Since we have the path, we can get the last re-export "path parent"... and then look for the next re-export directly from there:

crate::b::Foo => parent is crate::b
crate::b resolves to b module.
We iterate through all items from the b module and look for an item named Foo or for Foo original item directly.
We get all the item attributes and merge them with the first re-export.
If we didn't find the original item, we repeat.

And with this, we solved one problem. Now there is a big limitation to this solution: if any of the re-exports is from an external crate, then rustdoc cannot retrieve it because the rust ABI doesn't know about re-exports (they are discarded and only the re-exported item information is kept).

The pull request for this fix is here.

Re-export of re-export of private item

Title sounds a bit complicated but the bug comes directly from the same area. Example of the issue:

Runpub mod a {
    mod b {
        pub struct Foo;
    }
    pub use self::b::Foo; // Should be inlined.
}

pub use self::a::Foo; // Should not be inlined!

In here, only pub use self::b::Foo should be inlined because it re-exports a private item. However, pub use self::a::Foo is re-exporting an inlined item and therefore should not be inlined! The problem is that when we decide whether or not an item should be inlined, we only looked at the resolved type (a::b::Foo) and not the "item at the path" (a::Foo). So in here, when we checked self::a::Foo, we were directly checking a::b::Foo, which is in a private module, and based on the inlining rules, it should be inlined.

The solution for this was similar to how the previous bug was fixed: we just need to look directly at the item pointed by the path (so a::Foo) and check if this item needs to be inlined instead.

The pull request to fix this bug is here.

Re-exports attributes

I mentioned before that we were merging all "in-between" re-exports attributes in order to get all the documentation. However, there is a problem with that as well: what happens if an attribute is #[doc(hidden)] or #[doc(no_inline)]? If we merge these attributes, the re-export will simply not appear in the final documentation. Definitely not what we want!

We need to filter out such attributes. Sounds easy enough, right? So let's take a look at what an attribute looks like for rustc by checking the Attribute type. It provides some information like what kind of attribute it is, its ID, its "style" either inner #![] or outer #[]) and a span (location of where it's defined in the source code). And that's it. So obviously, it's not enough for us. It also provides a lot of methods which give us what we want. If fn ident returns "doc", it means we want to look inside. Luckily for us, there is fn meta_item_list which allows us to do that.

So we can now detect whether it's an attribute we want to filter out or not. Great! Now what happens if it's an attribute looking like this: #[doc(no_inline, alias = "Alias")]? We don't want to remove the whole attribute, only the no_inline part. Now back to Attribute::kind: if it's a AttrKind::DocComment, nothing to do, we keep everything. Otherwise, we now have a NormalAttr which contains nothing allowing us to modify it... Again, we will need to go around the limitations and write our own. AttrItem::item contains what we need. From this point, we basically re-recreate a TokenStream and set it to AttrItem::tokens without the parts we don't want.

If you never wrote a proc-macro (actually, even if you did), you might have never heard about TokenStream. It is "an abstract sequence of tokens, organized into TokenTrees". In short and very simplified, it allows you to manipulate the source code.

The pull request for this fix is here.

Re-exports of items with the same name

An interesting problem we encountered was: "what happens with re-export items with the same name?" Example:

Runmod nested {
    /// Foo the struct
    pub struct Foo {}

    #[allow(non_snake_case)] // To avoid warnings.
    /// Foo the function
    pub fn Foo() {}
}

pub use crate::nested::Foo;

Here, crate::nested::Foo is both the struct and the function. Meaning we need both to be displayed in the documentation. And if we want to push it a bit further:

Runpub use crate::nested::Foo;
pub use crate::Foo as Bar;

We then need to re-export both Foo items as Bar items as well.

This problem was especially problematic for the JSON output because we generate the ID based on the item kind. Except here, both Foo items have the same kind (re-export). So how to differentiate them? The solution I came up with was, in the case of the re-export, to have both the re-export ID and the re-exported item ID concatenated to prevent an ID conflict. You can take a look at the pull request here.

Anonymous re-exports

It's possible to have this:

Runmod ext {
    pub trait Foo {}
}

pub use ext::Foo as _;

If an item is re-exported as "_", we then discard it. The fix for this was actually very simple: we just need to check that the re-export isn't renamed as "_" otherwise we filter it out. And that's pretty much it. The pull request for this fix is here.

Conclusion

As you can see, there are a lot of small traps making the whole thing quite complicated to handle. There were a lot of other small things but this blog post is already long enough. Maybe for a potential follow-up.

Posted on the 08/03/2023 at 19:30 by @GuillaumeGomez

Testing rustdoc

rustdoc: Recent UI and UX changes in generated documentation 2

Back to articles list