llama.cpp and POWER9

This is a follow-up to my prior post about whisper.cpp. Georgi Gerganov has adapted his GGML framework to run the recently-circulating LLaMA weights. The PPC64 optimizations I made for whisper.cpp seem to carry over directly; after updating my Talos II’s PyTorch installation, I was able to get llama.cpp generating text from a prompt — completely offline — using the LLaMA 7B model.

$ ./main -m ./models/7B/ggml-model-q4_0.bin -t 32 -n 128 -p "Hello world in Common Lisp"
main: seed = 1678578687
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

main: prompt: 'Hello world in Common Lisp'
main: number of tokens in prompt = 7
     1 -> ''
 10994 -> 'Hello'
  3186 -> ' world'
   297 -> ' in'
 13103 -> ' Common'
 15285 -> ' Lis'
 29886 -> 'p'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000

Hello world in Common Lisp!
We are going to learn the very basics of Common Lisp, an open source lisp implementation, which is a descendant of Lisp1.
Common Lisp is the de facto standard lisp implementation of Mozilla Labs, who are using it to create modern and productive lisps for Firefox.
We are going to start by having a look at its implementation of S-Expressions, which are at the core of how Common Lisp implements its lisp features.
Then, we will explore its other features such as I/O, Common Lisp has a really nice and modern I

main: mem per token = 14828340 bytes
main:     load time =  1009.64 ms
main:   sample time =   334.95 ms
main:  predict time = 86867.07 ms / 648.26 ms per token
main:    total time = 90653.54 ms

The above example was just the first thing I tried; no tuning or prompt engineering — as Georgi mentioned in his README, don’t judge the model by the above output; this was just a quick test. The text is printed as soon as each token prediction is made, at a rate of about one word per second, which makes the generation interesting to watch.

whisper.cpp and POWER9

I saw whisper.cpp mentioned on Hacker News and I was intrigued. whisper.cpp takes an audio file as input, transcribes speech, and prints the output to the terminal. For some time I wanted to see how machine learning projects performed on my POWER9 workstation, and how hard they would be to get running. whisper.cpp had several properties that were interesting to me.

First, it is freely licensed, released under the MIT license and it uses the OpenAI Whisper model whose weights are also released under the MIT license. Second, whisper.cpp is a very compact C/C++ project with no framework dependencies. Finally, after the code and the model are downloaded, whisper.cpp runs completely offline, so it is inherently privacy-respecting.

There was one tiny build issue, but otherwise, it just built and ran on PPC64. I was expecting to need dependent libraries and so forth, but the code was extremely portable. However, I knew it was running much slower than it could. A clue: the minor build failure was due to a missing architecture-specific header for vector intrinsics (immintrin.h) that wasn’t available for ppc64le Debian.

I took the opportunity to learn PPC64 vector intrinsics. Thanks to the OpenPOWER initiative, freely-licensed, high-quality documentation was readily downloadable from https://openpowerfoundation.org (no registration, paywalls, click-throughs, JS requirements, etc.).

I did an initial implementation for POWER9 using the IBM Vector-Scalar Extension (VSX) and the transcription speed improved considerably; for the base model, the example transcription ran in about one tenth the time. Meanwhile, the upstream project had re-organized its intrinsics support, so I reorganized my implementation to fit in. This was trickier than I expected, because of how FP32/short packing and unpacking worked in VSX.

Here is a graph of the results:

A Bar Graph;
Title: whisper.cpp;
Subtitle: PPC64 Performance Improvements;
Subsubtitle: ./extra/bench-all.sh 32; 77226aa vs 3b010f9;
Y Axis Label: Encoding Duration (seconds);
X Axis Label: Whisper Model;
Data Format: Model: Pre-VSX, Post-VSX;
Bar Data Follow:;
tiny:    14.606,  1.283;
base:    33.438,  2.786;
small:  110.570,  8.534;
medium: 311.653, 22.282;
large:  692.425, 41.106;

For the sake of completeness (and for my friends on #talos-workstation) I also added big endian support and confirmed that the example ran on my PPC64BE virtual machine.

I’m sure more optimizations are possible. I may try OpenBLAS (CPU) and/or ROCm (GPU) acceleration later. So far everything is running on the CPU. But I’m glad that, at least for the inference side, the Whisper model can attain reasonable performance on owner-controlled hardware like the Talos II.

One potential downside of Whisper’s trained-model approach (vs other transcription approaches, like Julius) is that for downstream projects, the model is pretty much unfixable if it has an issue. I have run whisper.cpp on real world materials with excellent results, especially with the large model. But if there are bugs, I don’t think fixing them is possible without retraining the model, which at least for Whisper, seems beyond the means of individuals.

I would like to thank Matt Tegelberg for evaluating whisper.cpp’s results against real world audio and for proof-reading this post.

Thunderbird and OpenPGP

I recently helped some friends set up Thunderbird and OpenPGP; the combination is much more user-friendly now.

OpenPGP is end-to-end encryption for email. Each user generates a private and public key. Each user imports a copy of the other user’s public key in their Thunderbird setup (they can copy the keys onto a USB drive or even email them to each other). Then when they select the “Encrypt” button during message composition, Thunderbird does the rest: no one on the Internet can read the message body. (The message metadata, like the subject line and the fact that the users are emailing each other, is still visible to Internet mail server administrators.)

The OpenPGP + Thunderbird user experience in 2022 is quite straightforward! I was worried I would need to use add-ons and external programs, but nope, it’s all built-in, including keypair generation. Public key import/export via the key manager is simple. OpenPGP is also nicely integrated into the reading and composition interfaces, which clearly indicate message signing and encryption status. Nice work by the Thunderbird team!

Mastodon and HTML

Request to Mastodon instance operators: Provide a read-only anonymous HTML-only mode.

Update 2022-11-17: Mastodon supports RSS; try just tacking “.rss” onto the end of a Mastodon URL.  It doesn’t seem to work for comment threads, but it does work for main threads. For example

M-x gnus ENTER G R https://mastodon.social/@markmccaughrean.rss ENTER

will create a Gnus group containing the author’s Mastodon posts.  This is a nice workaround, though I do still hope logged-out HTML-only browsing will be possible again, post 4.x.  Thanks to the helpful people on the #mastodon IRC channel for the above suggestion.

I’ve been following some Mastodon instances for several months. In Emacs, I type:

ESCAPE x eww ENTER https://mastodon.ar.al/@aral ENTER

and, without any authentication requirement, I’m greeted with a read-only HTML view of the instance, for example:

Example toot in Mastodon v3.5.3 HTML-only mode.
Example toot in Mastodon v3.5.3 HTML-only mode.

This week I tried another instance

ESCAPE x eww ENTER https://mastodon.social/@markmccaughrean ENTER

and I am blocked by:

Mastodon v4.0.0rc1 "please enable JavaScript" message.
Mastodon v4.0.0rc1 “please enable JavaScript” message.

Is this a new default? I was surprised that read-only anonymous HTML-only mode (ala Twitter classic and Nitter) is not supported by all Mastodon instances.

uLisp on the SMART Response XE

The Lisp Badge mini computer has turned out to be quite useful and fun for little hardware hacking projects. Its designer, David Johnson-Davies, suggested that the SMART Response XE would make a good off-the-shelf uLisp computer, eliminating the need to build one from scratch.

I ordered a few SMART Response XEs from an auction site to see what was possible. I found Larry Bank‘s excellent Arduino library and fdufnews‘s schematics, which provided a great starting point. With guidance from David, I completed an initial uLisp port:

uLisp 4.1 running on the SMART Response XE

To load the code, I use an ISP programmer and a special PCB with POGO pins:

ISP POGO programming of the SMART Response XE

On Debian, I run:

make ispload

to load LispBadge.ino without a bootloader.

The SMART Response XE uses the ATmega128RFA1 microcontroller, which has a ZigBee IEEE 802.15.4 transceiver. David and I are discussing adding uLisp functions to make use of this capability.

Mezzano on Librebooted ThinkPads

I decided to try running Mezzano on real hardware. I figured my Librebooted ThinkPads would be good targets, since, thanks to Coreboot and the Linux kernel, I have reference source code for all the hardware.

On boot, these machines load Libreboot from SPI flash; included in this Libreboot image is GRUB, as a Coreboot payload.

Mezzano, on the other hand, uses the KBoot bootloader. I considered chainloading KBoot from GRUB, but I wondered if I could have GRUB load the Mezzano image directly, primarily to save a video mode switch.

I didn’t want to have to reflash the Libreboot payload on each modification (writing to SPI flash is slow and annoying to recover from if something goes wrong), so I tried building a GRUB module “out-of-tree” and loading it in the existing GRUB. Eventually I got this working, at which point I could load the module from a USB drive, allowing fast development iteration. (I realize out-of-tree modules are non-ideal so if there’s interest I may try to contribute this work to GRUB.)

The resulting GRUB module, mezzano.mod, is largely the KBoot Mezzano loader code, ported to use GRUB facilities for memory allocation, disk access, etc. It’s feature-complete, so I released it to Sourcehut. (I’ve only tested it on Libreboot GRUB, not GRUB loaded by other firmware implementations.)

Here’s a demo of loading Mezzano on two similar ThinkPads:

For ease of use, mezzano.mod supports directly loading the mezzano.image file generated by MBuild — instead of requiring that mezzano.image be dd‘d to a disk. It does so by skipping the KBoot partitions to find the Mezzano disk image. The T500 in the video is booted this way. Alternatively, mezzano.mod can load the Mezzano disk image from a device, as is done for the W500 in the video. Both methods look for the Mezzano image magic — first at byte 0 and, failing that, just after the KBoot partitions.

I added the set-i8042-bits argument because Coreboot does not set these legacy bits, yet Mezzano’s PS/2 keyboard and mouse drivers expect them; at this point Mezzano does not have a full ACPI device tree implementation.

Excorporate 1.0.0

I released Excorporate 1.0.0 recently and declared the API stable. I was careful not to break API compatibility throughout Excorporate’s development so the API version stays the same at “0”.

The project is now in a state where it does everything I want it to do, API-wise. The UI is still missing features like meeting creation, but I just call the required Elisp functions when I need to, referring to the “API Usage” section of the Info manual.

I think there’s a lot of potential to create nice user interface features with Excorporate’s API — like a scheduler that shows people’s availability with ASCII-art bars, usable on a TTY. The included Org, diary and calfw front-ends show real-world usage of the API. I hope people send patches for new user interface features and keybindings, and contribute new authentication methods. I’ll continue watching for bug reports.

Quickly Start a Common Lisp Script

So you want to write a utility script, and you want to write it in Common Lisp. I created a template Common Lisp script called start.lisp. It’s meant to be renamed and hacked up but it provides a starting point for a new Common Lisp script, with some utility libraries included.

Here’s a “one-liner” that you can paste into a text editor, verify visually, then copy-n-paste from the editor into a terminal. This will get you up and running on major distros:

sudo apt install sbcl || \
sudo dnf install sbcl || \
sudo yum install sbcl && \
git clone --recursive https://git.sr.ht/~fitzsim/cl-starter-script && \
./cl-starter-script/start.lisp --help

It uses some shell tricks I found on the EmacsWiki, and a UIOP1 feature ((uiop:argv0) with __CL_ARGV0) I discovered via the impressive cl-launch project. It’s too bad cl-launch isn’t more widely packaged, since it seems like a good idea.

I may add more utility features to this template repository, but I’ll also try to keep it simple and self-contained. Self-contained meaning after the initial git clone there’s no need to go back to the Internet for more libraries.

This is only meant for utility scripts (and for me as a learning exercise for ASDF and Common Lisp packages). It is available in my Sourcehut, and mirrored to Microsoft GitHub.

For bigger projects, check out Quickproject, and for installing newer Common Lisp implementations than your operating system provides, consider Roswell.

Thanks to Didier Verna for help with ASDF and for quickly incorporating into CLON some new features I requested.

1. “Utilities for Implementation- and OS- Portability”

Hosting Jitsi on ppc64le

I recently tried self-hosting Jitsi on Debian on the Talos II.

I had to apply some small workarounds for ppc64le, so I thought I’d post them here.

The cause of the first issue, no audio or video in a call, was reported in the Jitsi Videobridge log, /var/log/jitsi/jvb.log:

Exception in thread "Smack-Single Threaded Executor 0 (0)" java.lang.UnsatisfiedLinkError: /tmp/nativeutils5300293642203108/libjnisctp.so: /tmp/nativeutils5300293642203108/libjnisctp.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64 LE-bit platform)

To fix this, you have to build libjnisctp.so for ppc64le and substitute it for the x86-64 version of the library. Unless this is fixed upstream, you’ll have to re-do the steps starting from wrapper=[...] if you upgrade the jitsi-videobridge2 package.

git clone https://github.com/sctplab/usrsctp.git
git clone https://github.com/jitsi/jitsi-sctp
cd jitsi-sctp/
cp -R ../usrsctp usrsctp/
mvn package -DbuildSctp -DbuildNativeWrapper -DskipTests
wrapper="$(dpkg -L jitsi-videobridge2|grep jniwrapper-native)"
sudo cp "${wrapper}" "${wrapper}.bak"
cp "${wrapper}" ./tohack.jar
mkdir hacks
cd hacks
jar xf ../tohack.jar
cp ../jniwrapper/native/target/libjnisctp-linux-ppc64le.so \
jar cf hacked.jar *
sudo cp hacked.jar "${wrapper}"

I also had to help along the installation of luajwtjitsi, a dependency of Prosody, which at first errored out with:

sudo luarocks install luaossl
Installing https://luarocks.org/luaossl-20200709-0.src.rock

Error: Could not find library file for CRYPTO
  No file libcrypto.a in /usr/lib
  No file libcrypto.a in /usr/lib/x86_64-linux-gnu
  No file libcrypto.so in /usr/lib
  No file libcrypto.so in /usr/lib/x86_64-linux-gnu
  No file matching libcrypto.so.* in /usr/lib
  No file matching libcrypto.so.* in /usr/lib/x86_64-linux-gnu
You may have to install CRYPTO in your system and/or pass CRYPTO_DIR or CRYPTO_LIBDIR to the luarocks command.
Example: luarocks install luaossl CRYPTO_DIR=/usr/local

luarocks needs a hint as to the ppc64le library locations for luaossl, a dependency of luajwtjitsi.

sudo luarocks install luaossl \
	CRYPTO_LIBDIR=/usr/lib/powerpc64le-linux-gnu \
sudo luarocks install luajwtjitsi

I’m impressed with Jitsi; its self-hosting documentation is straightforward, and once it is installed, the video bridge works smoothly. I didn’t do detailed comparisons, but video call quality seems as good as any of the centrally-run services I’ve used.

Pocket Lisp Computer

I recently built three Lisp Badge computers with some help from my kids. I bought a hot air soldering station and learned TQFP soldering. The kids did some through-hole and SMT soldering and really enjoyed it!

The hardware assembly and debugging process was really fun, other than worrying several times that I had put too much heat into a component, or set the wrong programmable fuse. During that phase I received some advice from the board’s designer, which really helped.

I’ve learned from the hardware people at work to always order extra parts, and I did, including an extra PCB. I was half expecting to damage stuff while learning, so I was really happy that we ended up with all three boards fully working, after locating and fixing some cold solder joints.

It was challenging as DIY projects go, since the Lisp Badge is not available as a kit. But ever since I saw the Technoblogy post about it, I knew I had to attempt building one, and it was worth it. Other than the display, compatible parts were all available from Digi-Key, and I got the PCBs from OSH Park.

The result is a really neat little computer. Here is a picture showing “big text” support that I added:

Three Lisp Badge computers displaying (lisp-badge) in large text split across the screens.

I also added support for building standalone on Debian (Arduino-Makefile), made the audio buzzer work, and wrote an example of how to play a tune on the buzzer. I published the changes to my sourcehut.

It’s fun to try writing small programs on the badge itself, within the constraints of its minimal uLisp REPL.