LLM writes low-level code and improves performance by 2x
An incredible mise en abyme happened yesterday. A developer managed to get 2x improvement by letting DeepSeek-r1 write code for a specific part of llama.cpp.
Surprisingly, 99% of the code in this PR is written by DeekSeek-R1. The only thing I do is to develop tests and write prompts (with some trials and errors)
According to the prompts shared by the developer, the model spent thinking 3 to 5 minutes per response.
-
convert ARM NEON to WASM SIMD prompt
gist.github.com