Yes, sir!

turned on white YES LED signage, Julian Lonzano

What’s the simplest Unix command you know? For once there’s echo, which prints a string to stdout and true, which always terminates with an exit code of zero.

Among these simple Unix commands, there’s also yes. If you execute it without arguments, you get an infinite stream of y’s, separated by a newline:

y
y
y
y
(...you get the idea)

But what’s the use of this command? Well, here’s one one example:

1
yes | sh boring_installation.sh

Ever installed a program, which required you to type “y” and hit enter to keep going? yes to the rescue! It will carefully fulfill its duty, so you can keep playing Solitaire.

Writing a yes clone

It must be pretty easy to write a yes clone, right? Let’s try it in Python:

1
2
while True:
    print("y")

This works, but it’s not really efficient.

1
2
python yes.py | pv -r > /dev/null
[6.21MiB/s]

Let’s try it in Go:

yes_simple.go
1
2
3
4
5
6
7
8
9
package main

import "os"

func main() {
	for {
		os.Stdout.Write([]byte("y\n"))
	}
}
1
2
3
go build yes.go
./yes | pv -r > /dev/null
[1.30MiB/s]

That doesn’t look any better. It’s even slower than the Python version! Looking at the source code of [GNU coreutils] ( https://github.com/coreutils/coreutils/commits/master/src/yes.c), we can see this:

1
2
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
  continue;

So they simply use a buffer to make the write operations faster. The buffer size is defined by a constant named BUFSIZ, which gets chosen on each system so as to make I/O efficient. On my system, this is 1KB. Let’s try to implement this in Go:

yes_buffered.go
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
package main

import (
    "fmt"
    "os"
)

func main() {
    buf := bytes.Repeat([]byte("y\n"), 1024*4)
    for {
        n, err := os.Stdout.Write(buf)
        if err != nil {
            panic(err)
        }
        if n != len(buf) {
            panic("short write")
        }
    }
}

This will create one buffer of 8KB, which is filled with y’s and flushed to stdout in every iteration of the loop. This means that each time we do the syscall, we will write 8KB instead of just 2 bytes, which should save us some overhead of calling into the kernel and also copying memory between kernel and user space.

1
2
3
go build yes.go
./yes | pv -r > /dev/null
[2.31GiB/s]

Looks better. But we can still improve on that - in terms of raw throughput.

Enter vectorized I/O

The writev function, short for “write a vector”, is a system call in Unix-like operating systems that enables the efficient writing of multiple non-contiguous buffers to a file descriptor in a single operation. This system call is particularly valuable in situations where data is scattered across different memory locations, as it allows for improved performance and reduced overhead compared to multiple separate write operations. By accepting an array of iovec structures, each describing a distinct buffer and its length, writev streamlines the process of assembling and transmitting diverse data chunks, making it a powerful tool for optimizing I/O performance in various applications.

On my Mac, I had to jump through a few hoops to get this to work in a future-proof way, because Go deprecated the syscall package in favor of golang.org/x/sys/unix, which for reasons of ABI compatibility wraps libc functions instead of directly doing syscalls.

I’m planning to also write a tutorial on how to add a syscall to Go, but for now, just take a look at the code of the changelist. If you want to play with it, check the code out, and adapt the go module path like so:

go.mod
1
2
require golang.org/x/sys v0.15.0
replace golang.org/x/sys => /Users/<username>/go/src/golang.org/x/sys

Now we can use the Writev syscall in Go:

yes_iovec.go
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
package main

func main() {
	iov_max, err := sysconf.Sysconf(sysconf.SC_IOV_MAX)
	if err != nil {
		panic(err)
	}
	buf := make([]byte, unix.Getpagesize())
	iovecs := make([][]byte, 0, 1024)
	totalLen := 0
	if len(os.Args) < 2 {
		totalLen = fill(buf, yes)
	} else {
		totalLen = fill(buf, os.Args[1:]...)
	}
	for i := 0; i < int(iov_max); i++ {
		iovecs = append(iovecs, buf[0:totalLen])
	}
	for {
		_, err := unix.Writev(unix.Stdout, iovecs)
		if err != nil && err != unix.EAGAIN && err != unix.EINTR {
			fmt.Printf("writev: %v\n", err)
			os.Exit(1)
		}
	}
}

func fill(buf []byte, filler ...string) int {
	itemSize := 0
	for _, f := range filler {
		itemSize += len(f)
	}
	itemSize += len(filler)
	i := 0
	for i = 0; i+itemSize < len(buf); {
		for _, f := range filler {
			copy(buf[i:], f)
			i += len(f)
			buf[i] = ' '
			i++
		}
		buf[i-1] = '\n'
	}
	return i
}
1
2
3
go build yes.go
./yes | pv -r > /dev/null
[3.71GiB/s]

Now why is that faster? Well, the writev syscall can write multiple buffers in one syscall. Because our (stack-allocated) buffer still fills only 8K, but is now referenced from 1024 Iovec structs, we can write 8MB in one syscall. Isn’t that cool?

Lessons learned

writev and friends are a great way to improve performance of I/O-bound applications, even when using contiguously allocated buffers. There are a few drawback though: For starters, writev is not necessarily available on any all platforms. Additionally, the time to first byte is a bit higher now, because we have to fill the buffer first.

So, while this was an interesting learning experience, especially regarding implementing syscall wrappers for Go, I think there’s hardly a use case for printing gigabytes of y’s to stdout in the shortest amount of time.



Send a Webmention