LLM writes low-level code and improves performance by 2x

An incredible mise en abyme happened yesterday. A developer managed to get 2x improvement by letting DeepSeek-r1 write code for a specific part of llama.cpp.

Surprisingly, 99% of the code in this PR is written by DeekSeek-R1. The only thing I do is to develop tests and write prompts (with some trials and errors)

According to the prompts shared by the developer, the model spent thinking 3 to 5 minutes per response.

ggml : x2 speed for WASM by optimizing SIMD by ngxson · Pull Request #11453 · ggerganov/llama.cpp
github.com
convert ARM NEON to WASM SIMD prompt
gist.github.com

posted by marc.in.space in

Harmonique

LLM writes low-level code and improves performance by 2x

You might also like