So far, I've seen two builder patterns in official Rust code and other crates: <pre class="prettyprint lang-rust prettyprint-override"><code>impl DataBuilder { pub fn new() -> DataBuilder { ... } pub fn arg1(&mut self, arg1: Arg1Type) -> &mut Builder { ... } pub fn arg2(&mut self, arg2: Arg2Type) -> &mut Builder { ... } ... pub fn build(&self) -> Data { ... } } </code></pre> <pre class="prettyprint lang-rust prettyprint-override"><code>impl DataBuilder { pub fn new() -> DataBuilder { ... } pub fn arg1(self, arg1: Arg1Type) -> Builder { ... } pub fn arg2(self, arg2: Arg2Type) -> Builder { ... } ... pub fn build(self) -> Data { ... } } </code></pre> I'm writing a new crate and I'm a bit confused which pattern I should choose. I know it will be painful if I change some APIs later, so I want to make the decision now. I understand the semantic difference between them, but which one should we prefer in practical situations? Or how should we choose between them? Why?

Is it beneficial to build multiple values from the same builder? <ul> <li>If yes, use <code>&mut self</code> </li> <li>If no, use <code>self</code> </li> </ul> <hr> Consider <code>std::thread::Builder</code> which is a builder for <code>std::thread::Thread</code>. It uses <code>Option</code> fields internally to configure how to build the thread: <pre class="prettyprint"><code>pub struct Builder { name: Option<String>, stack_size: Option<usize>, } </code></pre> It uses <code>self</code> to <code>.spawn()</code> the thread because it needs ownership of the <code>name</code>. It could theoretically use <code>&mut self</code> and <code>.take()</code> the name out of the field, but then subsequent calls to <code>.spawn()</code> wouldn't create identical results, which is kinda bad design. It could choose to <code>.clone()</code> the name, but then there's an additional and often unneeded cost to spawn a thread. Using <code>&mut self</code> would be a detriment. Consider <code>std::process::Command</code> which serves as a builder for a <code>std::process::Child</code>. It has fields containing the program, args, environment, and pipe configuration: <pre class="prettyprint"><code>pub struct Command { program: CString, args: Vec<CString>, env: CommandEnv, stdin: Option<Stdio>, stdout: Option<Stdio>, stderr: Option<Stdio>, // ... } </code></pre> It uses <code>&mut self</code> to <code>.spawn()</code> because it does not take ownership of these fields to create the <code>Child</code>. It has to internally copy all that data over to the OS anyway, so there's no reason to consume <code>self</code>. There's also a tangible benefit and use-case to spawning multiple child processes with the same configuration. Consider <code>std::fs::OpenOptions</code> which serves as a builder for <code>std::fs::File</code>. It only stores basic configuration: <pre class="prettyprint"><code>pub struct OpenOptions { read: bool, write: bool, append: bool, truncate: bool, create: bool, create_new: bool, // ... } </code></pre> It uses <code>&mut self</code> to <code>.open()</code> because it does not need ownership of anything to work. It is somewhat similar to the thread builder since there is a path associated with a file just as there is a name associated with a thread, however, the file path is only passed in to <code>.open()</code> and not stored along with the builder. There's a use-case for opening multiple files with the same configuration. <hr> The considerations above really only cover the semantics of <code>self</code> in the <code>.build()</code> method, but there's plenty of justification that if you pick one method you should use that for the interim methods as well: <ul> <li>API consistency</li> <li>chaining <code>(&mut self) -> &mut Self</code> into <code>build(self)</code> obviously wouldn't compile</li> <li>using <code>(self) -> Self</code> into <code>build(&mut self)</code> would limit the flexibility of the builder to be reused long-term</li> </ul>

Should I take `self` by value or mutable reference when using the Builder pattern?

Tags:

design-patterns

rust

ownership

So far, I've seen two builder patterns in official Rust code and other crates:

impl DataBuilder {
    pub fn new() -> DataBuilder { ... }
    pub fn arg1(&mut self, arg1: Arg1Type) -> &mut Builder { ... }
    pub fn arg2(&mut self, arg2: Arg2Type) -> &mut Builder { ... }
    ...
    pub fn build(&self) -> Data { ... }
}

impl DataBuilder {
    pub fn new() -> DataBuilder { ... }
    pub fn arg1(self, arg1: Arg1Type) -> Builder { ... }
    pub fn arg2(self, arg2: Arg2Type) -> Builder { ... }
    ...
    pub fn build(self) -> Data { ... }
}

I'm writing a new crate and I'm a bit confused which pattern I should choose. I know it will be painful if I change some APIs later, so I want to make the decision now.

I understand the semantic difference between them, but which one should we prefer in practical situations? Or how should we choose between them? Why?

710

asked Dec 19 '21 01:12

Sprite

1 Answers

Is it beneficial to build multiple values from the same builder?

If yes, use &mut self
If no, use self

Consider std::thread::Builder which is a builder for std::thread::Thread. It uses Option fields internally to configure how to build the thread:

pub struct Builder {
    name: Option<String>,
    stack_size: Option<usize>,
}

It uses self to .spawn() the thread because it needs ownership of the name. It could theoretically use &mut self and .take() the name out of the field, but then subsequent calls to .spawn() wouldn't create identical results, which is kinda bad design. It could choose to .clone() the name, but then there's an additional and often unneeded cost to spawn a thread. Using &mut self would be a detriment.

Consider std::process::Command which serves as a builder for a std::process::Child. It has fields containing the program, args, environment, and pipe configuration:

pub struct Command {
    program: CString,
    args: Vec<CString>,
    env: CommandEnv,
    stdin: Option<Stdio>,
    stdout: Option<Stdio>,
    stderr: Option<Stdio>,
    // ...
}

It uses &mut self to .spawn() because it does not take ownership of these fields to create the Child. It has to internally copy all that data over to the OS anyway, so there's no reason to consume self. There's also a tangible benefit and use-case to spawning multiple child processes with the same configuration.

Consider std::fs::OpenOptions which serves as a builder for std::fs::File. It only stores basic configuration:

pub struct OpenOptions {
    read: bool,
    write: bool,
    append: bool,
    truncate: bool,
    create: bool,
    create_new: bool,
    // ...
}

It uses &mut self to .open() because it does not need ownership of anything to work. It is somewhat similar to the thread builder since there is a path associated with a file just as there is a name associated with a thread, however, the file path is only passed in to .open() and not stored along with the builder. There's a use-case for opening multiple files with the same configuration.

The considerations above really only cover the semantics of self in the .build() method, but there's plenty of justification that if you pick one method you should use that for the interim methods as well:

API consistency
chaining (&mut self) -> &mut Self into build(self) obviously wouldn't compile
using (self) -> Self into build(&mut self) would limit the flexibility of the builder to be reused long-term

answered Oct 24 '22 15:10

kmdreko

Related questions
                            
                                Long-running callback contract via WCF duplex channel - alternative design patterns?
                            
                                "AsyncFuture<T>" or what? Future<T> obtained in a background thread -- is it a pattern?
                            
                                Repository pattern with lazying loading using POCO
                            
                                Implementing search functionality with multiple optional parameters against database table
                            
                                What design pattern is Codeigniter using?
                            
                                Good or Bad OOP? [closed]
                            
                                Is there any difference between this two JavaScript patterns?
                            
                                Composition, how do you know when to stop? [closed]
                            
                                Design patterns to facilitate these behaviours (audit trail behaviour and undo)
                            
                                iOS Design Pattern equivalents when coming from a C#/Java world?
                            
                                Repository or Gateway pattern in Ruby
                            
                                To use Active object or not?
                            
                                What pattern should I use to express a Hierarchical Enum?
                            
                                How do you securely save an order in the cloud, if you can't verify payment from the cloud?
                            
                                Generic Vs Dependency injection
                            
                                Is it possible to make an object expose the interface of an type parameter?
                            
                                Codeigniter design patterns
                            
                                Lazy loading Pattern with Typescript
                            
                                Does the Facade Pattern violates the SOLID principles?
                            
                                Why aren't constants used for events in Node.js?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With