Type
Type
Local
From
From
Cartesia
Quantisation
Quantisation
uint4
Precision
Precision
No
Size
Size
3B
The Llamba models are efficient recurrent neural network models developed by Cartesia as part of their Edge library for high-performance machine learning applications. These models are designed to deliver strong performance while maintaining computational efficiency, making them suitable for edge deployment scenarios. Llamba comes in multiple sizes including 1B, 3B, and 8B parameter variants, and has been evaluated across standard benchmarks including ARC, PIQA, Winogrande, HellaSwag, Lambada, MMLU, and OpenBookQA. The models are available for use with both PyTorch and Metal frameworks, with support for inference in bfloat16 precision on GPU hardware.
The Llamba models are efficient recurrent neural network models developed by Cartesia as part of their Edge library for high-performance machine learning applications. These models are designed to deliver strong performance while maintaining computational efficiency, making them suitable for edge deployment scenarios. Llamba comes in multiple sizes including 1B, 3B, and 8B parameter variants, and has been evaluated across standard benchmarks including ARC, PIQA, Winogrande, HellaSwag, Lambada, MMLU, and OpenBookQA. The models are available for use with both PyTorch and Metal frameworks, with support for inference in bfloat16 precision on GPU hardware.