BitNet-git
BitNet-git (AUR) Wiki
bitnet-git provides the official inference framework for 1-bit Large Language Models (LLMs), based on Microsoft's bitnet.cpp. It is optimized for fast and energy-efficient inference on CPUs and GPUs using 1.58-bit quantization.
Installation
The easiest way to install is using an AUR helper like paru or yay:
$ paru -S bitnet-git
Alternatively, you can build manually using makepkg:
$ git clone https://aur.archlinux.org/bitnet-git.git $ cd bitnet-git $ makepkg -si
Hardware Optimization
The package automatically detects your architecture and uses the most appropriate kernels:
- x86_64: Uses TL2 (optimized Lookup Table kernel) for maximum performance.
- aarch64: Uses TL1 (optimized for ARMv8.2+).
Global Models Management
To streamline your workflow, we recommend setting up a global models directory and a shell helper. This allows you to run models by name without typing full paths or URIs.
Create the Models Directory
Create a standard directory in your home folder:
$ mkdir -p ~/.local/share/bitnet/models
Configure Your Shell
Add the following to your ~/.bashrc or ~/.zshrc:
# BitNet Models Directory
export BITNET_MODELS_DIR="$HOME/.local/share/bitnet/models"
# BitNet Runner Helper
bitnet-run() {
if [ -z "$1" ]; then
echo "Usage: bitnet-run <model_filename> [additional_args]"
return 1
fi
local model_name="$1"
shift
llama-cli -m "$BITNET_MODELS_DIR/$model_name" "$@"
}
Reload your shell: `source ~/.bashrc` (or `~/.zshrc`).
Download a Model
Download a recommended model directly into your new directory:
# Download the BitNet 2B model wget -P "$BITNET_MODELS_DIR" https://huggingface.co/microsoft/BitNet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf
Run Inference with Ease
Now you can run the model simply by referencing its filename:
bitnet-run ggml-model-i2_s.gguf -p "What are the benefits of 1-bit LLMs?" -cnv
Options
- -m <path>: Path to the GGUF model file.
- -p <"prompt">: Initial prompt for the model.
- -t <threads>: Number of CPU threads to use (e.g., -t 4).
- -temp <value>: Control randomness (e.g., -temp 0.7).
- -cnv: Enable conversation/chat mode.
Serving the Model via API
You can also run a local API server compatible with OpenAI's API:
bitnet-run -m ggml-model-i2_s.gguf --port 8080
Then you can access it via http://localhost:8080.
| Model | Parameters | Size (GGUF) | Description |
|---|---|---|---|
| bitnet_b1_58-large | 0.7B | ~150 MB | Blazing fast, great for testing. |
| BitNet-b1.58-2B-4T | 2.4B | ~500 MB | Best overall balance for daily use. |
| bitnet_b1_58-3B | 3.3B | ~700 MB | High performance, slightly more capable. |
| Llama3-8B-1.58 | 8.0B | ~1.6 GB | High quality, requires more RAM. |
Troubleshooting
Build Failures: Ensure you have base-devel, cmake, and clang installed.
Model Errors: Verify the model file is a valid GGUF and resides in your $BITNET_MODELS_DIR.