Interacting with data from FFI in Rust

If you ever need to call a function from another language, you'll have to use FFI (Foreign Function Interface) and very likely handle pointers. To help you with that, here are some tips.

Wrapping pointers using NonNull

The NonNull type is very important: whenever interacting with pointers, it makes the nullability check automatic. In the following codes, we assume the alignment to be correct. For example:

Runuse std::mem::size_of;

fn allocate() -> Option<*mut i8> {
    let ptr = unsafe { libc::malloc(size_of::<i8>()) };
    if ptr.is_null() {
        None
    } else {
        Some(ptr as *mut i8)
    }
}

Now with the NonNull type:

Runuse std::mem::size_of;
use std::ptr::NonNull;

fn allocate() -> Option<NonNull<i8>> {
    unsafe { NonNull::new(libc::malloc(size_of::<i8>()) as *mut i8) }
}

Please note that NonNull doesn't solve all issues:

You can still have concurrent access to the data pointed by the pointer.
You still have to free the memory yourself.
You still need to initialize the memory you allocated to the pointer.
You can still have dangling pointers.

But at least you can't have a null pointer anymore because you forgot to check it. It also improves your code readability and you get some niche optimization (Option<NonNull<T>> is the same size as *mut T).

Wrapping pointers to iterate

Let's say you have a void* that you got from reading a file containing i32s (the read function doesn't care of the type of the array, it only reads up to the number of bytes you asked it). You could iterate through it in C style like this:

Runuse std::mem::size_of;

let ptr = ptr as *const i32;
for pos in 0..nb_bytes / size_of::<i32>() {
    let nb = unsafe { *ptr.offset(pos as _) };
    // Do something with the number.
}

This is not great because it forces you to keep the size around and to handle the offset yourself. A much better alternative is to use slice::from_raw_parts:

Runuse std::mem::size_of;
use std::slice::from_raw_parts;

let array = unsafe {
    from_raw_parts(ptr as *const i32, nb_bytes / size_of::<i32>())
};
for nb in array {
    // Do something with the number.
}

Iterating over FFI generator

In some FFI APIs, you have to call a function multiple times to iterate through its elements (an iterator but C version). Let's say the FFI function looks like this:

element_t *get_next(iterator_t *iterator);

Every time you call get_next, you'll have the next element. So you can simply iterate using C-style code:

Runuse std::ptr::NonNull;

unsafe {
    // Let's "create" the iterator with a completely fictive function.
    let iterator = NonNull::new(create_iterator()).expect("null pointer!");
    loop {
        match NonNull::new(get_next(iterator.as_ptr())) {
            Some(elem) => {
                let elem = elem.as_ref();
                // Do something with elem.
            }
            None => break,
        }
    }
    free_iterator(iterator);
}

Not great for multiple reasons:

You have to check all pointers.
You have to free the memory yourself.
You need to use an infinite loop by hand because you don't know how many items you'll have.
You have to dereference a pointer.

To be clear: all these reasons will still exist with the Rust wrapper, but they'll be much harder to forget or to badly write. So instead, let's wrap it into a Rust type which implements the Iterator trait and the Drop trait.

Runuse std::ptr::NonNull;

struct IteratorWrapper<'a> {
    iterator: &'a mut iterator_t,
}

impl IteratorWrapper<'_> {
    fn new() -> Result<Self, &'static str> {
        unsafe {
            let ptr = create_iterator();
            if ptr.is_null() {
                Err("create_iterator failed")
            } else {
                Ok(Self { iterator: &mut *ptr })
            }
        }
    }
}

impl<'a> Iterator for IteratorWrapper<'a> {
    type Item = &'a element_t;

    fn next(&mut self) -> Option<Self::Item> {
        unsafe {
            let next = get_next(self.iterator);
            if next.is_null() {
                None
            } else {
                Some(&*next)
            }
        }
    }
}

impl Drop for IteratorWrapper<'_> {
    fn drop(&mut self) {
        unsafe {
            free_iterator(self.iterator);
        }
    }
}

Then you can simply use it like this:

Runmatch IteratorWrapper::new() {
    Ok(iterator) => {
        for elem in iterator {
            // Do something with `elem`.
        }
    }
    Err(e) => eprintln!("IteratorWrapper::new failed: {:?}", e),
}

As you can see, it's much easier to use and read even though the code is longer.

In case you want a working code for the unsafe functions to test the example:

Runtype element_t = i8;
type iterator_t = i8;

unsafe fn create_iterator() -> *mut iterator_t {
    static mut x: i8 = 0;
    &mut x as *mut _
}

unsafe fn get_next(it: *mut iterator_t) -> *mut element_t {
    if *it < 4 {
        *it += 1;
        it as *mut _
    } else {
        std::ptr::null_mut()
    }
}

unsafe fn free_iterator(_it: *mut iterator_t) {}

Wrap pointers as much as possible!

As you saw above, wrapping FFI code into other types makes it safer to use them. However, there is something very important to note: you need to be sure that the data you're iterating isn't being modified somewhere else at the same time! Pointers are primitive types in Rust, meaning that they implement the Copy trait, which means that the access to the data (the pointer) can be duplicated. It's then up to you to ensure it's not the case by either creating the pointer inside a Rust wrapper directly (which wouldn't implement the Copy trait).

Let's take an example using malloc and free functions (which allocate and free blocks of memory) using the libc crate:

Runuse std::mem::size_of;
use std::slice::from_raw_parts;

unsafe {
    let nb_elems: libc::size_t = 18;
    let ptr = libc::malloc(nb_elems * size_of::<i32>() as libc::size_t) as *mut i32;
    if ptr.is_null() {
        panic!("malloc failed");
    }
    // At this point, we have a chunk of uninitialized memory so we have to initialize it:
    for x in 0..nb_elems {
        std::ptr::write(ptr.offset(x as _), 0);
    }

    let array = from_raw_parts(ptr, nb_elems as _);

    // So at this point, still no problem!
    for nb in array {
        libc::free(ptr as *mut _);
        // Any iteration after this point will read memory we don't own!!!
    }
}

We took a very obvious case, but if you have a multi-threading context where you share pointers, such situations happen much more easily than what you might imagine. Please note that we could have used calloc instead of malloc but it makes the demonstration easier.

Let's rewrite it with a wrapper this time:

Runuse std::mem::size_of;
use std::ptr::NonNull;
use std::slice::from_raw_parts;

struct ArrayWrapper {
    ptr: NonNull<i32>,
    nb_elems: usize,
}

impl ArrayWrapper {
    fn new(nb_elems: usize) -> Result<Self, &'static str> {
        let nb_bytes = (nb_elems as libc::size_t)
            // This `checked_mul` call is VERY important to prevent invalid
            // memory allocation size because of an overflow!
            .checked_mul(size_of::<i32>() as libc::size_t)
            .ok_or("Allocation is too big")?;
        unsafe {
            let ptr = libc::malloc(nb_bytes as libc::size_t);
            NonNull::new(ptr as *mut i32)
                .map(|non_null_ptr| {
                    // We now initialize all bytes to 0.
                    let tmp_ptr: *mut i32 = non_null_ptr.as_ptr();
                    for x in 0..nb_elems {
                        std::ptr::write(tmp_ptr.offset(x as _), 0);
                    }
                    Self { ptr: non_null_ptr, nb_elems }
                })
                .ok_or("malloc failed")
        }
    }

    fn as_slice(&self) -> &[i32] {
        unsafe { from_raw_parts(self.ptr.as_ptr() as *const _, self.nb_elems) }
    }
}

impl Drop for ArrayWrapper {
    fn drop(&mut self) {
        unsafe {
            libc::free(self.ptr.as_ptr() as *mut _);
        }
    }
}

Which you can then use like this:

Runmatch ArrayWrapper::new(18) {
    Ok(array) => {
        for nb in array.as_slice() {
            // Do something with the number.
        }
    }
    Err(e) => eprintln!("ArrayWrapper::new failed: {:?}", e),
}

You can't free the pointer "by mistake", you can't duplicate it, you can't modify its content from another place in your code. You just erased a lot of potential common human errors with just a small wrapper. No need to think about the unsafe parts, it's all handled by it.

Conclusion

As you can see, there are a lot of traps when using FFI and we are far from having seen all of them! Being careful is really the bare minimum and is almost always never enough...

And finally: thanks a lot to Sebastian for the help and the feedback on this blog post!

Posted on the 29/07/2021 at 13:30 by @GuillaumeGomez

Improvements for #[doc] attributes in Rust

Performance improvement on front-end generated by rustdoc

Back to articles list