Upgrading to llama.cpp
The first time I tried llama.cpp
llama.cpp was one of the first inference servers available for running an LLM model. At the time, I wasn’t super impressed with AI but they were better than the old fashioned auto complete. I didn’t think a subscription was worth paying for but if I could run my own, i’d take it.
When I first got to the llama.cpp repo on github, running it still required compiling it from source code. Annoying but it wasn’t the end of the world. What actually stopped me was that there wasn’t something like hugging face available to pull models from. The models that were around were the hundreds of billion parameter versions or I could write code to create my own from training data.