(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
Back in the ancient days, I worked at IBM doing benchmarking for an OS project that was never released. We were using PPC601 Sandalfoots (Sandalfeet?) as dev machines. A perennial fight was devs writing their own memcpy using dst++ = src++ loops rather than the one in the library, which was written by one of my coworkers and consisted of 3 pages of assembly that used at least 18 registers.
The simple loop was something like X cycles/byte, while the library version was P + (Q cycles/byte) but the difference was such that the crossover point was about 8 bytes. So, scraping out the simple memcpy implementations from the code was about a weekly thing for me.
At this point, we discovered that our C compiler would pass structs by value (This was the early-ish days of ANSI C and was a surprise to some of my older coworkers.) and benchmarked that.
And discovered that its copy code was worse than the simple dst++ = src++ loops. By about a factor of 4. (The simple loop would be optimized to work with word-sized ints, while the compiler was generating code that copied each byte individually.)
If you are doing something where this matters, something like VTune is very important. So is the ability to convince people who do stupid things to stop doing the stupid things.
(Replying to PARENT post)
(Replying to PARENT post)
I donβt think you ever have to write code like this. Implement your math traits in terms for both value and reference types like the standard library does.
Go down to Trait Implementations for scalar types, for instance i32 [1]
impl Add<&i32> for &i32
impl Add<&i32> for i32
impl Add<i32> for &i32
impl Add<i32> for i32
Once you do that your ergonomics should be exactly the same as with built in scalar types.
(Replying to PARENT post)
Lots of criticism of my methodology in the comments here. Thatβs fine. That post was more of a self nerd snipe that went way deeper than I expected.
I hoped that my post would lead to a more definitive answer from some actual experts in the field. Unfortunately that never happened, afaik. Bummer.
(Replying to PARENT post)
(Replying to PARENT post)
The other question I have is which style should you use when writing a library? It's obviously not possible to benchmark all the software that will call your library but you still want to consider readability, performance as well as other factors such as common convention.
(Replying to PARENT post)
The clarity of the code using a particular library is such an big (but often under-appreciated) benefit that I would heavily lean in this direction when considering interface options. My 2c.
(Replying to PARENT post)
* Sprinkling & around everything in math expressions does make them ugly. Maybe rust needs an asBorrow or similar?
* If you inline everything then the speed is the same.
* Link time optimizations are also an easy win.
(Replying to PARENT post)
References may get optimized to copies where possible and sound (i.e. blittable and const), a common heuristic involves the size of a cache line (64b on most modern ISAs, including x86_64).
Using a Vector4 would have pushed the structure size beyond the 64b heuristic. You would also need to disable inlining for the measured methods.
(Replying to PARENT post)
(Replying to PARENT post)
It does sometimes matter though. One optimization Iβve seen in a few places is to box the error type, so that a result doesnβt copy the (usually empty) error by value on the stack. That actually makes a small performance difference, on the order of about 5-10%.
(Replying to PARENT post)
(Replying to PARENT post)
The cost of by-value lies in memory copies, while the cost of by-reference lies in dereferencing pointers where the values are needed, which might mean many more memory reads are needed than with by-value (depends on what you're doing). So it's just hard to tell which will do better in general -- there's no answer to that.
For a library, maybe providing by-value and by-reference interfaces should be good (except that will bloat the library). For everything else just use by-value as it has the best ergonomics.
(Replying to PARENT post)
Rust - By-Copy: 14124, By-Borrow: 8150
C++ - By-Copy: 12160, By-Ref: 11423
P.S. Just built it using LLVM under CLion IDE and the results are:
G:\temp\cpp\rust-cpp-bench\cpp\cmake\cmake-build-
release\fts_cmake_cpp_bench.exe
Totals:
Overlaps: 220384338
By-Copy: 4397
By-Ref: 4396 Delta: -0.0227428%
Process finished with exit code 0
(Replying to PARENT post)
(Replying to PARENT post)
But at the risk of loss of respect, I'll wait for Rust2ShinyNewLanguage to solve this.
All I know is I hope I'm smart enough to understand ShinyNewLanguage's compiler. Or maybe even build it.
I've got several projects that could use some additional Boxes of structures, and borrow instead of move, and maybe a few more complex reference counting mechanics.
Rust forced me to understand what that meant. That's good for building a better engineer.
But it's not fun to work with.
I hope the next experience is better. Sorry Rustaceans.
(Replying to PARENT post)
The benchmark made here could completely fall apart once more threads are added.
Modern computer architectures are non-uniform in terms of any kind of memory accesses. The same logical operations can have extremely varied costs depending on how the whole program flow goes.
(Replying to PARENT post)
Anyway, assuming it's not inlined I would guess pass-by-copy, maybe with an occasional exception in code with heavy register pressure.
Edit: Actually since it's a structure, the calling convention is to memory allocate it and pass a pointer, doh. So it should actually compile the same.
(Replying to PARENT post)
Also, whenever you do one of these, please post the full source with it. There's no reason to leave your readers in the dark, wondering what could be going on, which is exactly what I'm doing now, because there's almost no excuse for c++ to be slower in a task than rust--it's just a matter of how much work you need to put in to make it get there.
(Replying to PARENT post)
Performance-wise, if you're likely to touch every element in a type anyway, err on the side of copies. They are going to have to end up in registers eventually anyway, so you might as well let the caller find out the best way to put them there.
(Replying to PARENT post)
But also, the struct is 3x32 bits, and Rust auto-implements the Copy-trait for it. It is barely larger than u64, which is the size of the reference.
But life is only simpler when Copy and Clone can be auto-implemented.
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
Should small Rust structs be passed by-copy or by-borrow? - https://news.ycombinator.com/item?id=20798033 - Aug 2019 (107 comments)
(Replying to PARENT post)
Unless you are gonna benchmark something, for details like this you should pretty much always just trust the damn compiler and write the code in the most maintainable way.
This comes up in code review a LOT at my work:
- "you can write this simpler with XYZ"
- "but that will be slower because it's a copy/a function call/an indirect branch/a channel send/a shared memory access/some other combination of assumptions about what the compiler will generate and what is slow on a CPU"
I always ask them to either prove it or write the simple thing. If the code in question isn't hot enough to bother benchmarking it, the performance benefits probably aren't worth it _even if they exist_.