Type
Type
Local
From
From
Cartesia
Quantisation
Quantisation
uint8
Precision
Precision
No
Size
Size
8B
The Llamba models are efficient, high-performance language models designed for edge computing applications as part of Cartesia's Edge library. These recurrent-based models leverage distillation techniques to achieve strong performance across standard benchmarks while maintaining computational efficiency. Available in three sizes (1B, 3B, and 8B parameters), Llamba models are optimized for deployment on resource-constrained devices and support multiple frameworks including PyTorch and MLX for Metal hardware acceleration.
The Llamba models are efficient, high-performance language models designed for edge computing applications as part of Cartesia's Edge library. These recurrent-based models leverage distillation techniques to achieve strong performance across standard benchmarks while maintaining computational efficiency. Available in three sizes (1B, 3B, and 8B parameters), Llamba models are optimized for deployment on resource-constrained devices and support multiple frameworks including PyTorch and MLX for Metal hardware acceleration.