{"id":511,"date":"2023-03-11T20:21:52","date_gmt":"2023-03-12T00:21:52","guid":{"rendered":"https:\/\/www.fitzsim.org\/blog\/?p=511"},"modified":"2023-03-11T20:21:52","modified_gmt":"2023-03-12T00:21:52","slug":"llama-cpp-and-power9","status":"publish","type":"post","link":"https:\/\/www.fitzsim.org\/blog\/?p=511","title":{"rendered":"llama.cpp and POWER9"},"content":{"rendered":"\n<p>This is a follow-up to my <a href=\"https:\/\/www.fitzsim.org\/blog\/?p=484\">prior post about whisper.cpp<\/a>.  Georgi Gerganov has adapted his GGML framework to run the recently-circulating LLaMA weights.  The PPC64 optimizations I made for whisper.cpp seem to carry over directly; after updating my Talos II&#8217;s PyTorch installation, I was able to get <a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\">llama.cpp<\/a> generating text from a prompt &#8212; completely offline &#8212; using the LLaMA 7B model.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ .\/main -m .\/models\/7B\/ggml-model-q4_0.bin -t 32 -n 128 -p \"Hello world in Common Lisp\"\nmain: seed = 1678578687\nllama_model_load: loading model from '.\/models\/7B\/ggml-model-q4_0.bin' - please wait ...\nllama_model_load: n_vocab = 32000\nllama_model_load: n_ctx   = 512\nllama_model_load: n_embd  = 4096\nllama_model_load: n_mult  = 256\nllama_model_load: n_head  = 32\nllama_model_load: n_layer = 32\nllama_model_load: n_rot   = 128\nllama_model_load: f16     = 2\nllama_model_load: n_ff    = 11008\nllama_model_load: n_parts = 1\nllama_model_load: ggml ctx size = 4529.34 MB\nllama_model_load: memory_size =   512.00 MB, n_mem = 16384\nllama_model_load: loading model part 1\/1 from '.\/models\/7B\/ggml-model-q4_0.bin'\nllama_model_load: .................................... done\nllama_model_load: model size =  4017.27 MB \/ num tensors = 291\n\nmain: prompt: 'Hello world in Common Lisp'\nmain: number of tokens in prompt = 7\n     1 -> ''\n 10994 -> 'Hello'\n  3186 -> ' world'\n   297 -> ' in'\n 13103 -> ' Common'\n 15285 -> ' Lis'\n 29886 -> 'p'\n\nsampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000\n\n\nHello world in Common Lisp!\nWe are going to learn the very basics of Common Lisp, an open source lisp implementation, which is a descendant of Lisp1.\nCommon Lisp is the de facto standard lisp implementation of Mozilla Labs, who are using it to create modern and productive lisps for Firefox.\nWe are going to start by having a look at its implementation of S-Expressions, which are at the core of how Common Lisp implements its lisp features.\nThen, we will explore its other features such as I\/O, Common Lisp has a really nice and modern I\n\nmain: mem per token = 14828340 bytes\nmain:     load time =  1009.64 ms\nmain:   sample time =   334.95 ms\nmain:  predict time = 86867.07 ms \/ 648.26 ms per token\nmain:    total time = 90653.54 ms<\/code><\/pre>\n\n\n\n<p>The above example was just the first thing I tried; no tuning or prompt engineering &#8212; as Georgi mentioned in his README, don&#8217;t judge the model by the above output; this was just a quick test.  The text is printed as soon as each token prediction is made, at a rate of about one word per second, which makes the generation interesting to watch.<br \/> <br \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is a follow-up to my prior post about whisper.cpp. Georgi Gerganov has adapted his GGML framework to run the recently-circulating LLaMA weights. The PPC64 optimizations I made for whisper.cpp seem to carry over directly; after updating my Talos II&#8217;s PyTorch installation, I was able to get llama.cpp generating text from a prompt &#8212; completely &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.fitzsim.org\/blog\/?p=511\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;llama.cpp and POWER9&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-511","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"_links":{"self":[{"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/511","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=511"}],"version-history":[{"count":4,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/511\/revisions"}],"predecessor-version":[{"id":515,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/511\/revisions\/515"}],"wp:attachment":[{"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=511"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fitzsim.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}