Skip to content

Commit

Permalink
Merge branch 'main' into brevitas
Browse files Browse the repository at this point in the history
  • Loading branch information
JanFSchulte authored Jan 10, 2025
2 parents 399613e + 3fa2902 commit 73590b7
Show file tree
Hide file tree
Showing 324 changed files with 22,749 additions and 2,376 deletions.
2 changes: 1 addition & 1 deletion .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ generator:
stage: generate
image: python:3.8-alpine
variables:
N_TESTS_PER_YAML: 5
N_TESTS_PER_YAML: 4
tags:
- k8s-default
before_script:
Expand Down
12 changes: 6 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@ exclude: (^hls4ml\/templates\/(vivado|quartus)\/(ap_types|ac_types)\/|^test/pyte

repos:
- repo: https://github.com/psf/black
rev: 24.4.2
rev: 24.10.0
hooks:
- id: black
language_version: python3
args: ['--line-length=125',
'--skip-string-normalization']

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
rev: v5.0.0
hooks:
- id: check-added-large-files
- id: check-case-conflict
Expand All @@ -30,18 +30,18 @@ repos:
args: ["--profile", "black", --line-length=125]

- repo: https://github.com/asottile/pyupgrade
rev: v3.16.0
rev: v3.19.1
hooks:
- id: pyupgrade
args: ["--py36-plus"]

- repo: https://github.com/asottile/setup-cfg-fmt
rev: v2.5.0
rev: v2.7.0
hooks:
- id: setup-cfg-fmt

- repo: https://github.com/pycqa/flake8
rev: 7.1.0
rev: 7.1.1
hooks:
- id: flake8
exclude: docs/conf.py
Expand All @@ -50,7 +50,7 @@ repos:
'--extend-ignore=E203,T201'] # E203 is not PEP8 compliant

- repo: https://github.com/mgedmin/check-manifest
rev: "0.49"
rev: "0.50"
hooks:
- id: check-manifest
stages: [manual]
Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: software
authors:
- given-names: "FastML Team"
title: "hls4ml"
version: "v0.8.1"
version: "v1.0.0"
doi: 10.5281/zenodo.1201549
repository-code: "https://github.com/fastmachinelearning/hls4ml"
url: "https://fastmachinelearning.org/hls4ml"
Expand Down
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ pipeline {
sh '''#!/bin/bash --login
conda activate hls4ml-py310
conda install -y jupyterhub pydot graphviz pytest pytest-cov
pip install pytest-randomly jupyter onnx>=1.4.0 matplotlib pandas seaborn pydigitalwavetools==1.1 pyyaml tensorflow==2.14 qonnx torch git+https://github.com/google/qkeras.git pyparsing
pip install pytest-randomly jupyter onnx>=1.4.0 matplotlib pandas seaborn pydigitalwavetools==1.1 pyyaml tensorflow==2.14 qonnx torch git+https://github.com/jmitrevs/qkeras.git@qrecurrent_unstack pyparsing
pip install -U ../ --user
./convert-keras-models.sh -x -f keras-models.txt
pip uninstall hls4ml -y'''
Expand Down
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ If you have any questions, comments, or ideas regarding hls4ml or just want to s

# Documentation & Tutorial

For more information visit the webpage: [https://fastmachinelearning.org/hls4ml/](https://fastmachinelearning.org/hls4ml/)
For more information visit the webpage: [https://fastmachinelearning.org/hls4ml/](https://fastmachinelearning.org/hls4ml/).

For introductory material on FPGAs, HLS and ML inferences using hls4ml, check out the [video](https://www.youtube.com/watch?v=2y3GNY4tf7A&ab_channel=SystemsGroupatETHZ%C3%BCrich).

Detailed tutorials on how to use `hls4ml`'s various functionalities can be found [here](https://github.com/hls-fpga-machine-learning/hls4ml-tutorial).

Expand Down Expand Up @@ -49,8 +51,8 @@ hls_model = hls4ml.converters.keras_to_hls(config)
hls4ml.utils.fetch_example_list()
```

### Building a project with Xilinx Vivado HLS (after downloading and installing from [here](https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html))
Note: Vitis HLS is not yet supported. Vivado HLS versions between 2018.2 and 2020.1 are recommended.
### Building a project.
We will build the project using Xilinx Vivado HLS, which can be downloaded and installed from [here](https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html). Alongside Vivado HLS, hls4ml also supports Vitis HLS, Intel HLS, Catapult HLS and has some experimental support dor Intel oneAPI. The target back-end can be changed using the argument backend when building the model.

```Python
# Use Vivado HLS to synthesize the model
Expand All @@ -61,15 +63,19 @@ hls_model.build()
hls4ml.report.read_vivado_report('my-hls-test')
```

# FAQ

List of frequently asked questions and common HLS synthesis can be found [here](https://fastmachinelearning.org/hls4ml/faq.html)

# Citation
If you use this software in a publication, please cite the software
```bibtex
@software{fastml_hls4ml,
author = {{FastML Team}},
title = {fastmachinelearning/hls4ml},
year = 2023,
year = 2024,
publisher = {Zenodo},
version = {v0.8.1},
version = {v1.0.0},
doi = {10.5281/zenodo.1201549},
url = {https://github.com/fastmachinelearning/hls4ml}
}
Expand Down
22 changes: 22 additions & 0 deletions docs/advanced/auto.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
=============================
Automatic precision inference
=============================

The automatic precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.infer_precision.InferPrecisionTypes`) attempts to infer the appropriate
widths for a given precision. It is initiated by setting a precision in the configuration as ``'auto'``. (Note, only layer-level precisions can be set to ``'auto'``,
not model-level.) Functions like :py:class:`~hls4ml.utils.config.config_from_keras_model`, :py:class:`~hls4ml.utils.config.config_from_onnx_model`,
and :py:class:`~hls4ml.utils.config.config_from_pytorch_model` automatically set most precisions to ``'auto'`` if the ``'name'`` granularity is used.

.. note::
It is recommended to pass the backend to the ``config_from_*`` functions so that they can properly extract all the configurable precisions.

The approach taken by the precision inference is to set accumulator (the internal variable used to accumulate values in the matrix multiplications) and other precisions
to never truncate, using only the bitwidths of the inputs (not the values). This is quite conservative, especially in cases where post-training quantization is used, or
if the bit widths were set fairly loosely. The recommended action in that case is to edit the configuration and explicitly set some widths in it, potentially in an iterative process
after profiling the data. Another option is to pass a maximum precision using the ``max_precison`` parameter of the ``config_form_*`` functions. Then the automatic precision
inference will never set a bitwdith larger than the bitwidth of the ``max_precision`` or an integer part larger than the integer part of the ``max_precision`` that is passed.
(The bitwidth and integer parts of the ``max_precision`` are treated separately.)

When manually setting bitdwidths, the accumulator can overflow, and the precision may need to be reduced. For the accumulator, it is usually a bad idea to explicitly
enable rounding or saturation modes since it dramatically increases the execution time. For other types (e.g. output types or weight types), however, rounding and saturation handling
can be enabled as needed.
42 changes: 42 additions & 0 deletions docs/advanced/bramfactor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
==================================
Loading weights from external BRAM
==================================

.. note::
This feature is being evaluated for re-implementation. We welcome feedback from users how to make the implementation more flexible.

``hls4ml`` can optionally store weights in BRAMs external to the design. This is supported in Vivado/Vitis and Catapult backends. It is the responsibility of the user to ensure the weights are properly loaded during the operation of the design.

The feature works as a threshold, exposed through a ``BramFactor`` config parameter. Layers with more weights above the threshold will be exposed as BRAM interface. Consider the following code:

.. code-block:: Python
model = tf.keras.models.Sequential()
model.add(Dense(10, activation="relu", input_shape=(12,), name="dense_1"))
model.add(Dense(20, activation="relu", name="dense_2"))
model.add(Dense(5, activation="softmax", name="dense_3"))
model.compile(optimizer='adam', loss='mse')
config = hls4ml.utils.config_from_keras_model(model)
config["Model"]["Strategy"] = "Resource"
config["Model"]["BramFactor"] = 100
hls_model = hls4ml.converters.convert_from_keras_model(
model, hls_config=config, output_dir=output_dir, io_type=io_type, backend=backend
)
Having set ``BramFactor=100``, only layers with more than 100 weights will be exposed as external BRAM, in this case layers ``dense_1`` and ``dense_2``. ``BramFactor`` can currently be only set at the model level. The generated code will now have weights as part of the interface.

.. code-block:: C++

void myproject(
hls::stream<input_t> &dense_1_input,
hls::stream<result_t> &layer7_out,
model_default_t w2[120],
model_default_t w4[200]
) {
#pragma HLS INTERFACE axis port=dense_1_input,layer7_out
#pragma HLS INTERFACE bram port=w2,w4
...

When integrating the design, users can use the exposed interface to implement weight reloading scheme.
49 changes: 49 additions & 0 deletions docs/advanced/hgq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
===================================
High Granularity Quantization (HGQ)
===================================

.. image:: https://github.com/calad0i/HGQ/actions/workflows/sphinx-build.yml/badge.svg
:target: https://calad0i.github.io/HGQ/
.. image:: https://badge.fury.io/py/hgq.svg
:target: https://badge.fury.io/py/hgq
.. image:: https://img.shields.io/badge/arXiv-2405.00645-b31b1b.svg
:target: https://arxiv.org/abs/2405.00645

`High Granularity Quantization (HGQ) <https://github.com/calad0i/HGQ/>`_ is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By leveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.

.. image:: https://calad0i.github.io/HGQ/_images/overview.svg
:alt: Overview of HGQ
:align: center

Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model.

.. code-block:: Python
import keras
from HGQ.layers import HDense, HDenseBatchNorm, HQuantize
from HGQ import ResetMinMax, FreeBOPs
model = keras.models.Sequential([
HQuantize(beta=1.e-5),
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
HDense(10, beta=1.e-5),
])
opt = keras.optimizers.Adam(learning_rate=0.001)
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
callbacks = [ResetMinMax(), FreeBOPs()]
model.fit(..., callbacks=callbacks)
from HGQ import trace_minmax, to_proxy_model
from hls4ml.converters import convert_from_keras_model
trace_minmax(model, x_train, cover_factor=1.0)
proxy = to_proxy_model(model, aggressive=True)
model_hls = convert_from_keras_model(proxy, backend='vivado',output_dir=... ,part=...)
An interactive example of HGQ can be found in the `kaggle notebook <https://www.kaggle.com/code/calad0i/small-jet-tagger-with-hgq-1>`_. Full documentation can be found at `calad0i.github.io/HGQ <https://calad0i.github.io/HGQ/>`_.
22 changes: 11 additions & 11 deletions docs/advanced/model_optimization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ The code block below showcases three use cases of the hls4ml Optimization API -
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import CategoricalAccuracy
from tensorflow.keras.losses import CategoricalCrossentropy
from hls4ml.optimization.keras import optimize_model
from hls4ml.optimization.keras.utils import get_model_sparsity
from hls4ml.optimization.attributes import get_attributes_from_keras_model
from hls4ml.optimization.objectives import ParameterEstimator
from hls4ml.optimization.scheduler import PolynomialScheduler
from hls4ml.optimization.dsp_aware_pruning.keras import optimize_model
from hls4ml.optimization.dsp_aware_pruning.keras.utils import get_model_sparsity
from hls4ml.optimization.dsp_aware_pruning.attributes import get_attributes_from_keras_model
from hls4ml.optimization.dsp_aware_pruning.objectives import ParameterEstimator
from hls4ml.optimization.dsp_aware_pruning.scheduler import PolynomialScheduler
# Define baseline model and load data
# X_train, y_train = ...
# X_val, y_val = ...
Expand Down Expand Up @@ -75,7 +75,7 @@ To optimize GPU FLOPs, the code is similar to above:

.. code-block:: Python
from hls4ml.optimization.objectives.gpu_objectives import GPUFLOPEstimator
from hls4ml.optimization.dsp_aware_pruning.objectives.gpu_objectives import GPUFLOPEstimator
# Optimize model
# Note the change from ParameterEstimator to GPUFLOPEstimator
Expand All @@ -98,7 +98,7 @@ Finally, optimizing Vivado DSPs is possible, given a hls4ml config:
.. code-block:: Python
from hls4ml.utils.config import config_from_keras_model
from hls4ml.optimization.objectives.vivado_objectives import VivadoDSPEstimator
from hls4ml.optimization.dsp_aware_pruning.objectives.vivado_objectives import VivadoDSPEstimator
# Note the change from optimize_model to optimize_keras_model_for_hls4ml
# The function optimize_keras_model_for_hls4ml acts as a wrapper for the function, parsing hls4ml config to model attributes
Expand All @@ -124,11 +124,11 @@ Finally, optimizing Vivado DSPs is possible, given a hls4ml config:
acc_optimized = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_optimized, axis=1))
print(f'Optimized Keras accuracy: {acc_optimized}')
There are two more Vivado "optimizers" - VivadoFFEstimator, aimed at reducing register utilisation and VivadoMultiObjectiveEstimator, aimed at optimising BRAM and DSP utilisation.
Note, to ensure DSPs are optimized, "unrolled" Dense multiplication must be used before synthesing HLS, by modifying the config:
There are two more Vivado "optimizers" - VivadoFFEstimator, aimed at reducing register utilization and VivadoMultiObjectiveEstimator, aimed at optimizing BRAM and DSP utilization.
Note, to ensure DSPs are optimized, "unrolled" Dense multiplication must be used before synthesizing HLS, by modifying the config:

.. code-block:: Python
hls_config = config_from_keras_model(optimized_model)
hls_config['Model']['DenseResourceImplementation'] = 'Unrolled'
# Any addition hls4ml config, such as strategy, reuse factor etc...
hls_config['Model']['Strategy'] = 'Unrolled'
# Any addition hls4ml config, reuse factor etc...
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/command.rst → docs/api/command.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ hls4ml config
hls4ml config [-h] [-m MODEL] [-w WEIGHTS] [-o OUTPUT]
This creates a conversion configuration file. Visit Configuration section of the :doc:`Setup <setup>` page for more details on how to write a configuration file.
This creates a conversion configuration file. Visit Configuration section of the :doc:`Setup <../intro/setup>` page for more details on how to write a configuration file.

**Arguments**

Expand Down
Loading

0 comments on commit 73590b7

Please sign in to comment.