Quantization is a key component of accelerating neural networks efficiently. Over the years, multiple research works have shown the potential benefits of various flavours of mixed low precision quantization. However, in practice, support for quantization in mainstream deep learning frameworks hasn’t evolved much beyond targeting specific backends only, typically at 8-bit, through black-box interfaces that leave little control to end users or researchers. To fill this gap, we present Brevitas, a PyTorch library for neural network quantization focused on quantization-aware training. Brevitas provides building blocks at multiple levels of abstractions to model a reduced precision datapath at training time. Thanks to its flexibility, practitioners can adopt it to apply a variety of state-of-the-art quantization algorithms to multiple kinds of hardware backends, while researchers can leverage it to implement novel quantization algorithms or model hypothetical future accelerators.