-
Notifications
You must be signed in to change notification settings - Fork 422
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
324 changed files
with
22,749 additions
and
2,376 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
============================= | ||
Automatic precision inference | ||
============================= | ||
|
||
The automatic precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.infer_precision.InferPrecisionTypes`) attempts to infer the appropriate | ||
widths for a given precision. It is initiated by setting a precision in the configuration as ``'auto'``. (Note, only layer-level precisions can be set to ``'auto'``, | ||
not model-level.) Functions like :py:class:`~hls4ml.utils.config.config_from_keras_model`, :py:class:`~hls4ml.utils.config.config_from_onnx_model`, | ||
and :py:class:`~hls4ml.utils.config.config_from_pytorch_model` automatically set most precisions to ``'auto'`` if the ``'name'`` granularity is used. | ||
|
||
.. note:: | ||
It is recommended to pass the backend to the ``config_from_*`` functions so that they can properly extract all the configurable precisions. | ||
|
||
The approach taken by the precision inference is to set accumulator (the internal variable used to accumulate values in the matrix multiplications) and other precisions | ||
to never truncate, using only the bitwidths of the inputs (not the values). This is quite conservative, especially in cases where post-training quantization is used, or | ||
if the bit widths were set fairly loosely. The recommended action in that case is to edit the configuration and explicitly set some widths in it, potentially in an iterative process | ||
after profiling the data. Another option is to pass a maximum precision using the ``max_precison`` parameter of the ``config_form_*`` functions. Then the automatic precision | ||
inference will never set a bitwdith larger than the bitwidth of the ``max_precision`` or an integer part larger than the integer part of the ``max_precision`` that is passed. | ||
(The bitwidth and integer parts of the ``max_precision`` are treated separately.) | ||
|
||
When manually setting bitdwidths, the accumulator can overflow, and the precision may need to be reduced. For the accumulator, it is usually a bad idea to explicitly | ||
enable rounding or saturation modes since it dramatically increases the execution time. For other types (e.g. output types or weight types), however, rounding and saturation handling | ||
can be enabled as needed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
================================== | ||
Loading weights from external BRAM | ||
================================== | ||
|
||
.. note:: | ||
This feature is being evaluated for re-implementation. We welcome feedback from users how to make the implementation more flexible. | ||
|
||
``hls4ml`` can optionally store weights in BRAMs external to the design. This is supported in Vivado/Vitis and Catapult backends. It is the responsibility of the user to ensure the weights are properly loaded during the operation of the design. | ||
|
||
The feature works as a threshold, exposed through a ``BramFactor`` config parameter. Layers with more weights above the threshold will be exposed as BRAM interface. Consider the following code: | ||
|
||
.. code-block:: Python | ||
model = tf.keras.models.Sequential() | ||
model.add(Dense(10, activation="relu", input_shape=(12,), name="dense_1")) | ||
model.add(Dense(20, activation="relu", name="dense_2")) | ||
model.add(Dense(5, activation="softmax", name="dense_3")) | ||
model.compile(optimizer='adam', loss='mse') | ||
config = hls4ml.utils.config_from_keras_model(model) | ||
config["Model"]["Strategy"] = "Resource" | ||
config["Model"]["BramFactor"] = 100 | ||
hls_model = hls4ml.converters.convert_from_keras_model( | ||
model, hls_config=config, output_dir=output_dir, io_type=io_type, backend=backend | ||
) | ||
Having set ``BramFactor=100``, only layers with more than 100 weights will be exposed as external BRAM, in this case layers ``dense_1`` and ``dense_2``. ``BramFactor`` can currently be only set at the model level. The generated code will now have weights as part of the interface. | ||
|
||
.. code-block:: C++ | ||
|
||
void myproject( | ||
hls::stream<input_t> &dense_1_input, | ||
hls::stream<result_t> &layer7_out, | ||
model_default_t w2[120], | ||
model_default_t w4[200] | ||
) { | ||
#pragma HLS INTERFACE axis port=dense_1_input,layer7_out | ||
#pragma HLS INTERFACE bram port=w2,w4 | ||
... | ||
|
||
When integrating the design, users can use the exposed interface to implement weight reloading scheme. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
=================================== | ||
High Granularity Quantization (HGQ) | ||
=================================== | ||
|
||
.. image:: https://github.com/calad0i/HGQ/actions/workflows/sphinx-build.yml/badge.svg | ||
:target: https://calad0i.github.io/HGQ/ | ||
.. image:: https://badge.fury.io/py/hgq.svg | ||
:target: https://badge.fury.io/py/hgq | ||
.. image:: https://img.shields.io/badge/arXiv-2405.00645-b31b1b.svg | ||
:target: https://arxiv.org/abs/2405.00645 | ||
|
||
`High Granularity Quantization (HGQ) <https://github.com/calad0i/HGQ/>`_ is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By leveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level. | ||
|
||
.. image:: https://calad0i.github.io/HGQ/_images/overview.svg | ||
:alt: Overview of HGQ | ||
:align: center | ||
|
||
Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model. | ||
|
||
.. code-block:: Python | ||
import keras | ||
from HGQ.layers import HDense, HDenseBatchNorm, HQuantize | ||
from HGQ import ResetMinMax, FreeBOPs | ||
model = keras.models.Sequential([ | ||
HQuantize(beta=1.e-5), | ||
HDenseBatchNorm(32, beta=1.e-5, activation='relu'), | ||
HDenseBatchNorm(32, beta=1.e-5, activation='relu'), | ||
HDense(10, beta=1.e-5), | ||
]) | ||
opt = keras.optimizers.Adam(learning_rate=0.001) | ||
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True) | ||
model.compile(optimizer=opt, loss=loss, metrics=['accuracy']) | ||
callbacks = [ResetMinMax(), FreeBOPs()] | ||
model.fit(..., callbacks=callbacks) | ||
from HGQ import trace_minmax, to_proxy_model | ||
from hls4ml.converters import convert_from_keras_model | ||
trace_minmax(model, x_train, cover_factor=1.0) | ||
proxy = to_proxy_model(model, aggressive=True) | ||
model_hls = convert_from_keras_model(proxy, backend='vivado',output_dir=... ,part=...) | ||
An interactive example of HGQ can be found in the `kaggle notebook <https://www.kaggle.com/code/calad0i/small-jet-tagger-with-hgq-1>`_. Full documentation can be found at `calad0i.github.io/HGQ <https://calad0i.github.io/HGQ/>`_. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.