{"id":484,"date":"2023-02-15T18:25:55","date_gmt":"2023-02-15T22:25:55","guid":{"rendered":"https:\/\/www.fitzsim.org\/blog\/?p=484"},"modified":"2023-02-15T22:23:30","modified_gmt":"2023-02-16T02:23:30","slug":"whisper-cpp-and-power9","status":"publish","type":"post","link":"https:\/\/www.fitzsim.org\/blog\/?p=484","title":{"rendered":"whisper.cpp and POWER9"},"content":{"rendered":"\n<p>I saw <a href=\"https:\/\/github.com\/ggerganov\/whisper.cpp\">whisper.cpp<\/a>\nmentioned on Hacker News and I was intrigued. whisper.cpp takes an\naudio file as input, transcribes speech, and prints the output to the\nterminal.  For some time I wanted to see how machine learning projects\nperformed on my POWER9 workstation, and how hard they would be to get\nrunning.  whisper.cpp had several properties that were interesting to\nme.<\/p>\n\n\n\n<p>First, it is freely licensed, released under the MIT license and\nit uses the <a href=\"https:\/\/github.com\/openai\/whisper\">OpenAI\nWhisper<\/a> model whose weights are also released under the MIT\nlicense.  Second, whisper.cpp is a very compact C\/C++ project with no\nframework dependencies.  Finally, after the code and the model are\ndownloaded, whisper.cpp runs completely offline, so it is inherently\nprivacy-respecting.<\/p>\n\n\n\n<p>There was one tiny build issue, but otherwise, it just built and\nran on PPC64.  I was expecting to need dependent libraries and so\nforth, but the code was extremely portable.  However, I knew it was\nrunning much slower than it could.  A clue: the minor build failure was due to\na missing architecture-specific header for vector intrinsics\n(<code>immintrin.h<\/code>) that wasn&#8217;t available for ppc64le\nDebian.<\/p>\n\n\n\n<p>I took the opportunity to learn PPC64 vector intrinsics.  Thanks to the OpenPOWER initiative, freely-licensed, high-quality documentation was readily downloadable from <a href=\"https:\/\/openpowerfoundation.org\">https:\/\/openpowerfoundation.org<\/a> (no registration, paywalls, click-throughs, JS requirements, etc.).<\/p>\n\n\n\n<p>I did an initial implementation for POWER9 using the IBM\nVector-Scalar Extension (VSX) and the transcription speed improved\nconsiderably; for the base model, the example transcription ran in\nabout one tenth the time.  Meanwhile, the upstream project had\nre-organized its intrinsics support, so I reorganized my\nimplementation to fit in.  This was trickier than I expected, because\nof how FP32\/short packing and unpacking worked in VSX.<\/p>\n\n\n\n<p><\/p><p><\/p><p><\/p><p><\/p><p><\/p><p><\/p><p><\/p><p><\/p><p><\/p><p>Here is a graph of the results:<\/p>\n<a href=\"https:\/\/www.fitzsim.org\/screenshots\/whisper-vsx-graph-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-495\" src=\"https:\/\/www.fitzsim.org\/screenshots\/whisper-vsx-graph-1.png\" alt=\"A Bar Graph;\nTitle: whisper.cpp;\nSubtitle: PPC64 Performance Improvements;\nSubsubtitle: .\/extra\/bench-all.sh 32; 77226aa vs 3b010f9;\nY Axis Label: Encoding Duration (seconds);\nX Axis Label: Whisper Model;\nData Format: Model: Pre-VSX, Post-VSX;\nBar Data Follow:;\ntiny:    14.606,  1.283;\nbase:    33.438,  2.786;\nsmall:  110.570,  8.534;\nmedium: 311.653, 22.282;\nlarge:  692.425, 41.106;\" width=\"978\" height=\"758\"\/><\/a>\n\n\n\n<p>For the sake of completeness (and for my friends on\n#talos-workstation) I also added big endian support and confirmed that\nthe example ran on my PPC64BE virtual machine.<\/p>\n\n\n\n<p>I&#8217;m sure more optimizations are possible.  I may try OpenBLAS (CPU)\nand\/or ROCm (GPU) acceleration later.  So far everything is running on\nthe CPU.  But I&#8217;m glad that, at least for the inference side, the\nWhisper model can attain reasonable performance on owner-controlled\nhardware like the Talos II.<\/p>\n\n\n\n<p>One potential downside of Whisper&#8217;s trained-model approach (vs other transcription approaches, like <a href=\"https:\/\/github.com\/julius-speech\/julius\">Julius<\/a>) is that for downstream projects, the model is pretty much unfixable if it has an issue.  I have run whisper.cpp on real world materials with excellent results, especially with the large model.  But if there are bugs, I don&#8217;t think fixing them is possible without retraining the model, which at least for Whisper, seems beyond the means of individuals.<\/p>\n\n\n\n<p><em>I would like to thank Matt Tegelberg for evaluating whisper.cpp&#8217;s results against real world audio and for proof-reading this post.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I saw whisper.cpp mentioned on Hacker News and I was intrigued. whisper.cpp takes an audio file as input, transcribes speech, and prints the output to the terminal. For some time I wanted to see how machine learning projects performed on my POWER9 workstation, and how hard they would be to get running. whisper.cpp had several &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.fitzsim.org\/blog\/?p=484\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;whisper.cpp and POWER9&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-484","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"_links":{"self":[{"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/484","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=484"}],"version-history":[{"count":22,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/484\/revisions"}],"predecessor-version":[{"id":508,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/484\/revisions\/508"}],"wp:attachment":[{"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=484"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=484"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=484"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}