rustdoc is the tool to generate documentation in rust. In this blog post we will see why handling re-exports with rustdoc is so complicated and how it's done.
So first, let's see what a re-export is by taking an example.
Let's say we are writing a library (named lib
) with some types dispatched in sub-modules:
Runpub mod sub_module1 {
pub struct Foo;
}
pub mod sub_module2 {
pub struct AnotherFoo;
}
Users can import them like this:
Runuse lib::sub_module1::Foo;
use lib::sub_module2::AnotherFoo;
But what if you want the types to be available directly at the crate root or if we don't want the modules to be visible for users? That's where re-exports come in:
Run// `sub_module1` and `sub_module2` are not visible outside.
mod sub_module1 {
pub struct Foo;
}
mod sub_module2 {
pub struct AnotherFoo;
}
// We re-export both types:
pub use crate::sub_module1::Foo;
pub use crate::sub_module2::AnotherFoo;
And now users will be able to do:
Runuse lib::{Foo, AnotherFoo};
And since both sub_module1
and sub_module2
are private, users won't be able to import them.
Now let's see why rustdoc is having such a hard time handling them.
Let's first explain (very) quickly how rustdoc works: rustc provides an API to get access to its internal data, allowing "custom drivers" (like clippy and rustdoc) to do what they do, although at different levels. It requires a bit of setup first (I'm writing a crate to help with this part, you can check it out here) but once done, you have access to rustc internals.
Now, we can actually go through all the items rustc knows of and start working on them. To do so we use a Visitor. You just need to implement the methods you're interested into and then they will be called directly by the visitor.
Once we have all the information we want, we process everything in order to render into the expected format (either HTML or JSON).
Now back to the re-exports.
There are many different problems with re-exports, so let's go through some of them.
First, let's explain what "inlining" means in this context. And as often, let's show it with code and images:
pub mod public {
pub struct Foo;
pub struct Bar;
}
#[doc(inline)]
pub use crate::public::Bar;
pub use crate::public::Foo;
And here what the documentation for this code looks like:
So we can see a module, a re-export and a struct. The struct is public::Bar
which was inlined because of #[doc(inline)]
. It means that even though it's a re-export, we see as if it was declared directly into the crate root.
Apart from #[doc(inline)]
, a re-export will be inlined if:
#[doc(hidden)]
).Important to be noted: if an item has #[doc(hidden)]
, then the only way to make it appear in a re-export is to add #[doc(inline)]
on the corresponding re-export.
To sum all this in code:
Runpub mod public {
pub struct Bar;
#[doc(hidden)]
pub struct PubHidden;
}
#[doc(hidden)]
pub mod hidden {
pub struct Hidden;
}
mod private {
pub struct Private;
}
#[doc(inline)] // Needed because `public` is, well, public.
pub use crate::public::Bar;
#[doc(inline)] // Needed because `PubHidden` has `#[doc(hidden)]`.
pub use crate::public::PubHidden;
pub use crate::hidden::Hidden;
pub use crate::private::Private;
In rustc internals, a re-export is represented by an enum variant containing:
path
: x::y::z
for example.kind
: to simplify, it's either a glob import (x::*
) or a single import (x::y
).Now what's interesting is that the path
contains the "resolution" of the import, allowing us to get access to the item being imported/re-exported. And here comes the first big issue. Let's consider this code:
Runmod a {
/// Some docs.
pub struct Foo;
}
mod b {
/// first
pub use crate::a::Foo;
}
/// second
pub use crate::b::Foo;
In here, the only visible item in the documentation is crate::Foo
, which is re-exported by crate::b::Foo
. The generated documentation of the crate will display:
So the documentation from all re-exports and from the original item was added as expected. To see the issue with this code, we need to go a bit deeper: the resolution of crate::b::Foo
points directly to crate::a::Foo
. Great no? There is just one tiny little issue: crate::b::Foo
re-export has documentation and it needs to be displayed. So now, how do we get all re-exports documentation between the last re-export and the item?
rustc internals don't provide an easy solution for that, so we needed to go around this limitation. Since we have the path, we can get the last re-export "path parent"... and then look for the next re-export directly from there:
crate::b::Foo
=> parent is crate::b
crate::b
resolves to b
module.b
module and look for an item named Foo
or for Foo
original item directly.And with this, we solved one problem. Now there is a big limitation to this solution: if any of the re-exports is from an external crate, then rustdoc cannot retrieve it because the rust ABI doesn't know about re-exports (they are discarded and only the re-exported item information is kept).
The pull request for this fix is here.
Title sounds a bit complicated but the bug comes directly from the same area. Example of the issue:
Runpub mod a {
mod b {
pub struct Foo;
}
pub use self::b::Foo; // Should be inlined.
}
pub use self::a::Foo; // Should not be inlined!
In here, only pub use self::b::Foo
should be inlined because it re-exports a private item. However, pub use self::a::Foo
is re-exporting an inlined item and therefore should not be inlined! The problem is that when we decide whether or not an item should be inlined, we only looked at the resolved type (a::b::Foo
) and not the "item at the path" (a::Foo
). So in here, when we checked self::a::Foo
, we were directly checking a::b::Foo
, which is in a private module, and based on the inlining rules, it should be inlined.
The solution for this was similar to how the previous bug was fixed: we just need to look directly at the item pointed by the path (so a::Foo
) and check if this item needs to be inlined instead.
The pull request to fix this bug is here.
I mentioned before that we were merging all "in-between" re-exports attributes in order to get all the documentation. However, there is a problem with that as well: what happens if an attribute is #[doc(hidden)]
or #[doc(no_inline)]
? If we merge these attributes, the re-export will simply not appear in the final documentation. Definitely not what we want!
We need to filter out such attributes. Sounds easy enough, right? So let's take a look at what an attribute looks like for rustc by checking the Attribute
type. It provides some information like what kind of attribute it is, its ID, its "style" either inner #![]
or outer #[]
) and a span
(location of where it's defined in the source code). And that's it. So obviously, it's not enough for us. It also provides a lot of methods which give us what we want. If fn ident
returns "doc", it means we want to look inside. Luckily for us, there is fn meta_item_list
which allows us to do that.
So we can now detect whether it's an attribute we want to filter out or not. Great! Now what happens if it's an attribute looking like this: #[doc(no_inline, alias = "Alias")]
? We don't want to remove the whole attribute, only the no_inline
part. Now back to Attribute::kind
: if it's a AttrKind::DocComment
, nothing to do, we keep everything. Otherwise, we now have a NormalAttr
which contains nothing allowing us to modify it... Again, we will need to go around the limitations and write our own. AttrItem::item
contains what we need. From this point, we basically re-recreate a TokenStream
and set it to AttrItem::tokens
without the parts we don't want.
If you never wrote a proc-macro (actually, even if you did), you might have never heard about TokenStream
. It is "an abstract sequence of tokens, organized into TokenTrees
". In short and very simplified, it allows you to manipulate the source code.
The pull request for this fix is here.
An interesting problem we encountered was: "what happens with re-export items with the same name?" Example:
Runmod nested {
/// Foo the struct
pub struct Foo {}
#[allow(non_snake_case)] // To avoid warnings.
/// Foo the function
pub fn Foo() {}
}
pub use crate::nested::Foo;
Here, crate::nested::Foo
is both the struct and the function. Meaning we need both to be displayed in the documentation. And if we want to push it a bit further:
Runpub use crate::nested::Foo;
pub use crate::Foo as Bar;
We then need to re-export both Foo
items as Bar
items as well.
This problem was especially problematic for the JSON output because we generate the ID based on the item kind. Except here, both Foo
items have the same kind (re-export). So how to differentiate them? The solution I came up with was, in the case of the re-export, to have both the re-export ID and the re-exported item ID concatenated to prevent an ID conflict. You can take a look at the pull request here.
It's possible to have this:
Runmod ext {
pub trait Foo {}
}
pub use ext::Foo as _;
If an item is re-exported as "_", we then discard it. The fix for this was actually very simple: we just need to check that the re-export isn't renamed as "_" otherwise we filter it out. And that's pretty much it. The pull request for this fix is here.
As you can see, there are a lot of small traps making the whole thing quite complicated to handle. There were a lot of other small things but this blog post is already long enough. Maybe for a potential follow-up.