A little bit of neuroscience and a little bit of computing

  • 107 Posts
  • 343 Comments
Joined 2 years ago
cake
Cake day: January 19th, 2023

help-circle







  • Thanks for the reply! Also nice curriculum there. I haven't done most of that (or not enough) but I've basically written up that as a list of shit I should have under my belt ... so nice for me to see personally too.

    I really like that “climbing a spiral” pitch! I wonder how adaptable it would be to learning Rust, however. Or rather, how one could construct said spiral differently; it already feels like The Book spirals upward and outward from a core of general/abstracted programming.

    Yea maybe. For me, and I'd imagine many who've read The Book, a more Spiral-ish or biological/horizontal learning approach on references/pointers etc would go far I think. I haven't searched hard for it, but from about mid-way through Ch4 I've thought that a good deep dive on working with the borrow checker would go far. Given the blog posts and forum threads we've linked to here, it almost feels like there's a hole in the available material. It could work a bit like a reference too so long as it has a good amount of examples well organised along conceptual grounds, which I think it should. But if it addressed all the required concepts and then dug into good examples, both trivial and realistic/applied and mapped the relevant problems and solutions back to all the concepts and their treatment elsewhere in the book, while also providing reading guides for people of differing backgrounds ... I think it could go quite far. Maybe you'd be just the person? 😉


  • Thanks for the link! Will try to read it.

    small quibble: String, Box, and Vec are all technically pointers that do take ownership (rather, they have ownership of what they’re pointing to). It’s really only “references” in Rust that don’t take ownership. Which, IIRC, is more or less how The Book introduces references in chap 4. So I’m not really sure how what you’re describing would differ from the current state of the book. Nonetheless, I understand the confusion that comes from “learning about” lifetimes so “late”.

    Yea, I was going to specify (lol) that my first time through I really missed or didn't focus on what Box and Vec were as possibly alternative tools to references or at least interesting objects in terms of combining ownership and pointers to data on the heap ... and how I'm not sure that's really on me given the order of ideas in The Book and the emphasis on references. Again, for me right now, it seems that "lifetimes" is the connecting concept in that it clarifies what's going on and what problems are being solved/created. For me, instead of the section in Ch 4, it was an early section in the Rustonomicon (that I was just casually browsing through ... I know nothing about unsafe rust) that leaned in hard on the centrality of lifetimes.

    Nonetheless, I'm a little keen now to get a grip on how a Box (or other owning + pointing type) can be used as an easier/cleaner substitute for references. I don't have a clear image in my mind and I think I just need some application to work with or example to read through (maybe in the blog post you linked).

    Thanks again for the link!


  • Generally agreed. Awesome to hear that you're gelling with the language! If I may ask, what particular experience or background knowledge do you think makes you and rust such a good fit? Knowing OCaml (I've certainly heard of OCaml as an adjacent language to rust in terms of concepts and interests)?

    I think it could use a more explicit and concrete list of “requirements”, though, like a dependency tree of prior concepts to acquire. Maybe that could help it strike a better balance between over- and under-explaining things.

    Yea, agreed. In general my policy is that a text book or monographs has the task of mapping as many prior backgrounds to new understandings as possible. This often requires constructing non-linear content and guides to navigating it but can often be worth while IMO.

    In relation to rust, it seems to me that it's the sort of thing that benefits from openly engaging in directionless "horizontal" learning in order to build up a necessary foundation for then building "vertically" once enough pieces are in place. At least more so than more basic languages.

    A "steep" learning curve, IMO, is often confused for this sort of thing, where in reality you're not learning difficult or "steep" things sequentially, but meandering confusedly around a space non-linearly until things finally "click" (ie, until the foundation is sufficiently formed). The thing is that openly embracing a structure that isn't always building "upward" can make the "journey" feel much less steep than it is.

    I used to run a workshop on git for non-CS academic researchers and I opened with a warning that it was going to be structured like climbing a spiral, starting with the problems it solves, then basic ideas, then how those ideas can solve the problems, then a demonstration of each followed by painting a complete picture of how git would fit into one's workflows. And I found it worked well. People often found it boring at first, but really appreciated it once things started clicking and had fewer confusions I think than if things were done differently.


  • Sidenote: I wonder if a language could be developed that allowed the programmer to define or setup their own set of constraints around lifetimes and references. Sort of like what Zig seems to allow for memory allocator/allocation strategies, but specifically for ownership semantics. As I type this out, I can’t shake the feeling that this is basically what Rust has already done with their & vs &mut vs Box vs RC vs RefCell etc.

    Yep to all of your post ... and this is my impression too (without having gotten on top of the smart pointers yet).

    Only thing I can imagine adding though would be an optional garbage collector (that is shipped in the binary like with Go). I'm not sure how helpful one would be beyond RC/RefCell (I'd imagine a good GC could handle a complex network of references better, which wouldn't be very "rust"-like, but expanding the scope of the language toward "just getting stuff done" territory would be the main point of a GC). A quick search turned up this project which seems to be very small but still active (interestingly, the blog post linked there points out that rust used to have a built in GC but removed it to be a 3rd party library instead).

    Re: lifetime notations, it’s good to know that this helped you. For me it was the combination of encountering the particular part in chap 4 of The Book where they examine the case of a function that accepts a tuple and returns a ref to the first element (or something along those lines), and trying to define (and then use) a struct that would store a ref to some other data in a hobby project.

    Yea for me the structure and framing of The Book when lifetimes were first brought up (in Ch 4) didn't work for me as it came off to me as another series of problems to solve or constraints to navigate.

    What I find interesting and compelling about the framing in my top post is that it conceptualises references and borrowing by starting with lifetimes. I'm not as experienced as you and haven't internalised rust like you have (awesome to read though!), but I think I would have found it better to go from the basic idea of pointers as a way of not taking ownership and then going straight into "everything has a lifetime with changing permissions/locks over that time depending on what other references exist and while rust infers these lifetime durations sometimes, sufficient complexity requires you to specify them yourself" and then building out from that basis. For instance, I'm not even sure The Book makes it clear that Rust is inferring variable lifetimes automatically (??).


  • I got the feeling from the thread that the author's perspective and framing is coming from having to help many people with their problems and likely misuses and misunderstandings of rust references, where your point of not having "paid enough attention to how the book introduces references" may be applicable to many of the people they've helped.

    For me, while there isn't anything conceptually new in their framing, I found the emphasis and perspective helpful and affirming, in part because I hadn't got to the point of verbalising the "locks" metaphor as you say you probably did but instead was still thinking of it all as a set of constraints to work within and problems to avoid. Framing the situation in a more "active"/"positive" (as in posit) way where the emphasis is on what references "do" rather than just on what they "prevent" seems helpful to me, and that's the part that I feel like is missing from "The Book" or anything else I've read. It also feels like a better way of explaining why lifetime annotations become necessary (generally wondering which being how I ended up reading those threads)











  • Yep. I’m with you on all of that!

    The pitching of The Book is definitely off (this my attempt to write a basic intro to the borrow checker, just to see where my own brain was at but also out of a somewhat fanciful interest in what a better version could look like).

    I wonder if the lack of C or assembly equivalents is because the internals aren’t stable??

    And yea, optimising data copies on the first go seems to be a trap (for me too!)

    Do you know if there are any good tools for analysing the hot spots of data copying?




  • 4. Any hard learnt lessons? Or tried and true tips?

    A basic lesson or tip from a discussion in this community (link here):

    PS: Abso-fucking-lutely just clone and don’t feel bad about it. Cloning is fine if you’re not doing it in a hot loop or something. It’s not a big deal. The only thing you need to consider is whether cloning is correct - i.e. is it okay for the original and the clone to diverge in the future and not be equal any more? Is it okay for there to be two of this value? If yes, then it’s fine.

    IE, using copy/clone as an escape hatch for ownership issues is perfectly fine.


    Another one that helps put ownership into perspective I think is this section in the Rustonomicon on unsafe rust, and the section that follows:

    There are two kinds of reference:

    • Shared reference: &
    • Mutable reference: &mut

    Which obey the following rules:

    • A reference cannot outlive its referent
    • A mutable reference cannot be aliased

    That's it. That's the whole model references follow.

    Of course, we should probably define what aliased means.

    error[E0425]: cannot find value `aliased` in this scope
     --> <rust.rs>:2:20
      |
    2 |     println!("{}", aliased);
      |                    ^^^^^^^ not found in this scope
    
    error: aborting due to previous error
    

    Unfortunately, Rust hasn't actually defined its aliasing model. 🙀

    While we wait for the Rust devs to specify the semantics of their language, let's use the next section to discuss what aliasing is in general, and why it matters.


    Basically it highlights that rust's inferential understanding of the lifetimes of variables is a bit coarse (and maybe a work in progress?) ... so when the compiler raises an error about ownership, it's being cautious (as The Book stresses, unsafe code may not have any undefined behaviour).

    It helps I think reframe the whole thing as not being exclusively about correctness but just making sure memory bugs don't happen


    Last lesson I think I've gained after chapter 4 was that the implementation and details of any particular method or object matter. The quiz in chapter 6 (question 5) I've mentioned is I think a good example of this. What exactly the Copy and Clone trait are all about too ... where I found looking into those made me comfortable with the general problem space I was navigating in working with ownership in rust. Obviously the compiler is the safe guard, but you don't always want to get beaten over with ownership problems.


  • 2. Any persistent gripes, difficulties or confusions?

    I'm not entirely sure why, but the whole Double-Free issue never quite sunk in from chapter 4. It's first covered, I think here, section 4.3: Fixing an Unsafe Program: Copying vs. Moving Out of a Collection

    I think it was because the description of the issue kinda conflated ownership and the presence or absence of the Copy trait, which isn't covered until way after chapter 4. Additionally, it seems that the issue mechanically comes down to whether the value of a variable is actually a pointer to a heap allocation or not (??)

    It was also a behaviour/issue that tripped me up in a later quiz, in an ownership recap quiz in chapter 6 where I didn't pick it up correctly.

    Here's the first quiz question that touches on it (see Q2 in The Book here, by scrolling down).

    Which of the following best describes the undefined behavior that could occur if this program were allowed to execute?

    let s = String::from("Hello world");
    let s_ref = &s;
    let s2 = *s_ref;
    println!("{s2}");
    

    For those not clear, the issue, if this code were permitted to execute, is that s2 would be a pointer to the same String that s points too. Which means that when deallocations occur as the scope ends, both s and s2 would be deallocated, as well as their corresponding memory allocations on the heap. The second such deallocation would then be of undefined content.

    I find this simple enough, but I feel like the issue can catch me whenever the code or syntax obscures that a pointer would be copied, not some other value, like in the re-cap quiz in chapter 6 that I got wrong and linked above.


  • If I had to explain ownership in rust (based on The Book, Ch 4)

    I had a crack at this and found myself writing for a while. I thought I'd pitch at a basic level and try to provide a sort core and essential conceptual basis (something which I think The Book might be lacking a little??)

    Dunno if this will be useful or interesting to anyone, but I found it useful to write. If anyone does find any significant errors, please let me know!

    General Idea or Purpose

    • Generally, the whole point is to prevent memory mismanagement.
    • IE "undefined behaviour": whenever memory can be read/written when it is no longer controlled by a variable in the program.
      • Rust leans a little cautious in preventing this. It will raise compilation errors for some code that won't actually cause undefined. And this is in large part, AFAICT, because its means of detecting how long a variable "lives" can be somewhat course/incomplete (see the Rustonomicon). Thus, rust enforces relatively clean variable management, and simply copying data will probably be worth it at times.

    Ownership

    • Variables live in, or are "owned by" a particular scope (or stack frames, eg functions).
    • Data, memory, or "values" are owned by variables, and only one at a time.
    • Variables are stuck in their scopes (they live and die in a single scope and can't be moved out).
    • Data or memory can be moved from one owning variable to another. In doing so they can also move from one scope to another (eg, by passing a variable into a function).
    • Once a variable has its data/memory moved to another, that variable is dead.
    • If data/memory is not moved away from its variable by the completion of its scope, that data/memory "dies" along with the variable (IE, the memory is deallocated).
    // > ON THE HEAP
    
    // Ownership will be "moved" into this function's scope
    fn take_ownership_heap(_: Vec<i32>) {}
    
    let a = vec![1, 2, 3];
    take_ownership_heap(a);
    
    // ERROR
    let b = a[0];
    // CAN'T DO: value of `a` is borrowed/used after move
    // `a` is now "dead", it died in `take_ownership_heap()`;
    

    • Variables of data on the stack (eg integers) are implicitly copied (as copying basic data types like integers is cheap and unproblematic), so ownership isn't so much of an issue.
    • Copying (or cloning) data/memory on the heap is not trivial and so must be done explicitly (eg, with my_variable.copy()) and in the case of custom types (eg structs) added to or implemented for that particular type (which isn't necessarily difficult).
    // > ON THE STACK
    
    // An integer will copied into `_`, and no ownership will be moved
    fn take_ownership_stack(_: i32) {}
    
    let x = 11;
    take_ownership_stack(x);
    
    let y = x * 10;
    // NOT A PROBLEM, as x was copied into take_ownerhsip_stack
    

    Borrowing (with references)

    • Data can be "borrowed" without taking ownership.
    • This kind of variable is a "reference" (AKA a "non-owning pointer").
    • As the variable doesn't "own" the data, the data can "outlive" the reference.
      • Useful for passing a variable's data into a function without it "dying" in that function.
    fn borrow_heap(_: &Vec<i32>) {}
    
    let e = vec![1, 2, 3];
    // pass in a reference
    borrow_heap(&e);
    
    let f = e[0];
    // NOT A PROBLEM, as the data survived `borrow_heap`
    // because `e` retained ownership.
    // &e, a reference, only "borrowed" the data
    
    • But it also means that the abilities or "permissions" of the reference with respect to the data are limited and more closely managed in order to prevent undefined behaviour.
    • The chief limitation is that two references cannot exist at the same time where one can mutate the data it points to and another can read the same data.
    • Multiple references can exist that only have permission to read the same data, that's fine.
    • The basic idea is to prevent data from being altered/mutated while something else is reading the same data, as this is a common cause of problems.
    • Commonly expressed as Pointer Safety Principle: data should never be aliased and mutated at the same time.
    • For this reason, shared references are "read only" references, while unique references are mutable references that enable their underlying data to be mutated (AKA, mutable references).
      • A minor confusion that can arise here is between mutable or unique references and reference variables that are mutable. A unique reference is able to mutate the data pointed to. While a mutable variable that is also a reference can have its pointer and the data/memory and points to mutated. These are independent aspects and can be freely combined.
      • Perhaps easily understood by recognising that a reference is just another variable whose data is a pointer or memory address.
    • Additionally, while variables of data on the stack typically don't run into ownership issues because whenever ownership would be moved the data is implicitly copied, references to such variables can exist and they are subject to the same rules and monitoring by the compiler.
    // >>> Can have multiple "shared references"
    
    let e_ref1 = &e;
    let e_ref2 = &e;
    
    let e1 = e_ref1[0];
    let e2 = e_ref2[0];
    
    // >>> CANNOT have shared and mutable/unique references
    
    let mut j = vec![1, 2, 3];
    
    // A single mutable or "unique" reference
    let j_mut_ref = &mut j;
    // can mutate the actual vector
    j_mut_ref[0] = 11;
    
    // ERROR
    let j_ref = &j;
    // CANNOT now have another shared/read-only reference while also having a mutable one (j_mut_ref)
    // mutation actually needs to occur after the shared reference is created
    // in order for rust to care, otherwise it can recognise that the mutable
    // reference is no longer used and so doesn't matter any more
    j_mut_ref[1] = 22;
    
    // same as above but for stack data
    let mut j_int = 11;
    let j_int_mut_ref = &mut j_int;
    // ERROR
    let j_int_ref = &j_int;
    // CANNOT assign another reference as mutable reference already exists
    
    *j_int_mut_ref = 22;
    // dereference to mutate here and force rust to think the mutable reference is still "alive"
    

    Ownership and read/write permissions are altered when references are created

    • The state of a variable's ownership and read-only or mutable permissions is not really static.
    • Instead, they are altered as variables and references are created, used, left unused and then "die" (ie, depending on their "life times").
    • This is because the problem being averted is multiple variables mangling the same data. So what a variable or reference can or cannot do depends on what other related variables exist and what they are able to do.
    • Generally, these "abilities" can be thought of as "permissions".
      • "Ownership": the permission a variable has to move its ownership to another variable or "kill" the "owned" data/memory when the variable falls out of scope.
      • "Read": permission to read the data
      • "Write": permission to mutate the data or write to the referenced heap memory
    • As an example of permissions changing: a variable loses "ownership" of its data when a reference to it is created. This prevents a variable from taking its data into another scope and then potentially "dying" and being deallocated for a reference to that memory to then be used and read or write random/arbitrary data from the now deallocated memory.
    • Similarly, a variable that owns its data/memory/value will lose all permissions if a mutable reference (or unique reference) is made to the same data/variable. This is why a mutable reference is also known as a unique reference.
    • Permissions are returned when the reference(s) that altered permissions are no longer used, or "die" (IE, their lifetime comes to an end).
    // >>> References remove ownership permissions
    
    fn take_ownership_heap(_: Vec<i32>) {}
    
    let k = vec![1, 2, 3];
    
    let k_ref = &k;
    
    // ERROR
    take_ownership_heap(k);
    // Cannot move out of `k` into `take_ownership_heap()` as it is currently borrowed
    let k1 = k_ref[0];
    // if the shared reference weren't used here, rust be happy...
    // as the reference's lifetime would be considered over
    
    // >>> Mutable reference remove read permissions
    
    let mut m = 13;
    
    let m_mut_ref = &mut m;
    
    // ERROR
    let n = m * 10;
    // CANNOT read or use `m` as it's mutably borrowed
    *m_mut_ref += 1;
    // again, must use the mutable reference here to "keep it alive"
    

    Lifetimes are coming

    fn first_or(strings: &Vec<String>, default: &String) -> &String {
        if strings.len() > 0 {
            &strings[0]
        } else {
            default
        }
    }
    
    // Does not compile
    error[E0106]: missing lifetime specifier
     --> test.rs:1:57
      |
    1 | fn first_or(strings: &Vec<String>, default: &String) -> &String {
      |                      ------------           -------     ^ expected named lifetime parameter
      |
      = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `strings` or `default`
    
    • In all of the above, the dynamics of what permissions are available depends on how long a variable is used for, or its "lifetime".
    • Lifetimes are something that rust detects by inspecting the code. As stated above, it can be a bit cautious or course in this detection.
    • This can get to the point where you will need to explicitly provide information as to the length of a variable's lifetime in the code base. This is done with lifetime annotations and are the 'as in the following code: fn longest<'a>(x: &'a str, y: &'a str) -> &'a str.
    • They won't be covered here ... but they're coming.
    • Suffice it to appreciate why this is a problem needing a solution, with the code above as an example:
      • the function first_or takes two references but returns only one reference that will, depending entirely on runtime logic, depend on one of the two input references. IE, depending on what happens at runtime, one of the input references have a longer lifetime than the other. As Rust cannot be sure of the lifetimes of all three references, the programmer has to provide that information. A topic for later.