29 Downloads Updated 6 months ago
Updated 6 months ago
6 months ago
83b7400cbe9b · 43GB ·
ALIA-40B is a 40B parameter base language model developed by the Barcelona Supercomputing Center (BSC).
Original model and details here: https://huggingface.co/BSC-LT/ALIA-40b
This model is released under a permissive Apache 2.0 license. Along with the open weights, all training scripts and configuration files are made publicly available in this GitHub repository.
Transformer-based decoder-only language model that has been pre-trained from scratch on 9.37 trillion tokens of highly curated data. The pre-training corpus contains text in 35 European languages and code.
The full list of hyperparameters can be found here.
| Total Parameters | 40,433,885,184 |
| Embedding Parameters | 2,097,152,000 |
| Layers | 48 |
| Hidden size | 8,192 |
| Attention heads | 64 |
| Context length | 32,768 |
| Vocabulary size | 256,000 |
| Precision | bfloat16 |
| Embedding type | RoPE |
| Activation Function | SwiGLU |
| Layer normalization | RMS Norm |
| Flash attention | ✅ |
| Grouped Query Attention | ✅ |
| Num. query groups | 8 |
ollama run csala/ALIA-40B
There are the steps that were followed to convert the weights to GGUF format and quantize the model.
Requirement: huggingface_hub
huggingface-cli download --cache-dir $HF_CACHE_DIR BSC-LT/ALIA-40b
HF_CACHE_DIR points at the dir in which we will store the weights in raw format. It can be ..
This command downloads the model into the directory $HF_CACHE_DIR/models--BSC-LT--ALIA-40b/
The safetensors files end up inside $HF_CACHE_DIR/models--BSC-LT--ALIA-40b/snapshots/<snapshot-id>/, where <snapshot-id> is the latest snapshot available in HuggingFace.
Requirement: llama.cpp repository and python requirements installed.
cd $LLAMA_PATH
python convert_hf_to_gguf.py $HF_CACHE_DIR/models--BSC-LT--ALIA-40b/snapshots/<snapshot-id>/ --outfile $ALIA_PATH/ALIA-40B.gguf
LLAMA_PATH is the root of the llama.cpp directory.
ALIA_PATH is the directory where we want to store the ALIA-40B GGUF files and Modelfile.
This creates the file $ALIA_PATH/ALIA-40B.gguf, which we will use as source to derive the different quantized versions.
Requirement: llama.cpp built and installed.
For each quantized version $QUANTIZATION that we want to generate (e.g. Q4_K), run:
cd $ALIA_PATH
llama-quantize ALIA-40B.gguf ALIA-40B.$QUANTIZATION.gguf $QUANTIZATION
This generates the file ALIA-40B.$QUANTIZATION.gguf (e.g. ALIA-40B.Q4_K.gguf) within the same directory, with the weights quantized to the indicated level.
ModelfileFor each quantized version that we want to import into Ollama, create a Modelfile_$QUANTIZATION with the following contents (replace $QUANTIZATION with the actual value):
FROM ./ALIA-40B.$QUANTIZATION.gguf
For example, for the Q4_K quantization level we will create the file Modelfile_Q4_K with contents:
FROM ./ALIA-40B.Q4_K.gguf
For each quantized version, import the model into ollama using the command:
ollama create ALIA-40B:$LOWERCASE_QUANTIZATION -f Modelfile_$QUANTIZATION
NOTE: Notes created as lowercase per convention
For example, for the Q4_K quantization level we will run:
ollama create ALIA-40B:q4_k -f Modelfile_Q4_K
For each quantized version, push the model into ollama using the commands:
ollama cp ALIA-40B:$LOWERCASE_QUANTIZATION csala/ALIA-40B:$LOWERCASE_QUANTIZATION
ollama push csala/ALIA-40B:$LOWERCASE_QUANTIZATION