High level network definitions with pre-trained weights in TensorFlow (tested with >= 1.2.0
).
- Applicability. Many people already have their own ML workflows, and want to put a new model on their workflows. TensorNets can be easily plugged together because it is designed as simple functional interfaces without custom classes.
- Manageability. Models are written in
tf.contrib.layers
, which is lightweight like PyTorch and Keras, and allows for ease of accessibility to every weight and end-point. Also, it is easy to deploy and expand a collection of pre-processing and pre-trained weights. - Readability. With recent TensorFlow APIs, more factoring and less indenting can be possible. For example, all the inception variants are implemented as about 500 lines of code in TensorNets while 2000+ lines in official TensorFlow models.
Each network (see full list) is not a custom class but a function that takes and returns tf.Tensor
as its input and output. Here is an example of ResNet50
:
import tensorflow as tf
import tensornets as nets
inputs = tf.placeholder(tf.float32, [None, 224, 224, 3])
model = nets.ResNet50(inputs)
assert isinstance(model, tf.Tensor)
You can load an example image by using utils.load_img
returning a np.ndarray
as the NHWC format:
from tensornets import utils
img = utils.load_img('cat.png', target_size=256, crop_size=224)
assert img.shape == (1, 224, 224, 3)
Once your network is created, you can run with regular TensorFlow APIs ๐ because all the networks in TensorNets always return tf.Tensor
. Using pre-trained weights and pre-processing are as easy as pretrained()
and preprocess()
to reproduce the original results:
with tf.Session() as sess:
img = model.preprocess(img) # equivalent to img = nets.preprocess(model, img)
sess.run(model.pretrained()) # equivalent to nets.pretrained(model)
preds = sess.run(model, {inputs: img})
You can see the most probable classes:
print(utils.decode_predictions(preds, top=2)[0])
[(u'n02124075', u'Egyptian_cat', 0.28067636), (u'n02127052', u'lynx', 0.16826575)]
TensorNets enables us to deploy well-known architectures and benchmark those results faster โก๏ธ. For more information, you can check out the lists of utilities, examples, and architectures.
Each object detection model can be coupled with any network in TensorNets (see performances) and takes two arguments: a placeholder and a function acting as a stem layer. Here is an example of YOLOv2
for PASCAL VOC:
import tensorflow as tf
import tensornets as nets
inputs = tf.placeholder(tf.float32, [None, 416, 416, 3])
model = nets.YOLOv2(inputs, nets.Darknet19)
img = nets.utils.load_img('cat.png')
with tf.Session() as sess:
sess.run(model.pretrained())
preds = sess.run(model, {inputs: model.preprocess(img)})
boxes = model.get_boxes(preds, img.shape[1:3])
Like other models, a detection model also returns tf.Tensor
as its output. You can see the bounding box predictions (x1, y1, x2, y2, score)
by using model.get_boxes(model_output, original_img_shape)
and visualize the results:
from tensornets.datasets import voc
print("%s: %s" % (voc.classnames[7], boxes[7][0])) # 7 is cat
import numpy as np
import matplotlib.pyplot as plt
box = boxes[7][0]
plt.imshow(img[0].astype(np.uint8))
plt.gca().add_patch(plt.Rectangle(
(box[0], box[1]), box[2] - box[0], box[3] - box[1],
fill=False, edgecolor='r', linewidth=2))
plt.show()
More detection examples such as FasterRCNN on VOC2007 are here ๐.
An example output of utils.print_summary(model)
:
Scope: resnet50
Total layers: 54
Total weights: 320
Total parameters: 25,636,712
An example output of utils.print_weights(model)
:
Scope: resnet50
conv1/conv/weights:0 (7, 7, 3, 64)
conv1/conv/biases:0 (64,)
conv1/bn/beta:0 (64,)
conv1/bn/gamma:0 (64,)
conv1/bn/moving_mean:0 (64,)
conv1/bn/moving_variance:0 (64,)
conv2/block1/0/conv/weights:0 (1, 1, 64, 256)
conv2/block1/0/conv/biases:0 (256,)
conv2/block1/0/bn/beta:0 (256,)
conv2/block1/0/bn/gamma:0 (256,)
...
utils.get_weights(model)
returns a list of all thetf.Tensor
weights as shown in the above
An example output of utils.print_outputs(model)
:
Scope: resnet50
conv1/pad:0 (?, 230, 230, 3)
conv1/conv/BiasAdd:0 (?, 112, 112, 64)
conv1/bn/batchnorm/add_1:0 (?, 112, 112, 64)
conv1/relu:0 (?, 112, 112, 64)
pool1/pad:0 (?, 114, 114, 64)
pool1/MaxPool:0 (?, 56, 56, 64)
conv2/block1/0/conv/BiasAdd:0 (?, 56, 56, 256)
conv2/block1/0/bn/batchnorm/add_1:0 (?, 56, 56, 256)
conv2/block1/1/conv/BiasAdd:0 (?, 56, 56, 64)
conv2/block1/1/bn/batchnorm/add_1:0 (?, 56, 56, 64)
conv2/block1/1/relu:0 (?, 56, 56, 64)
...
utils.get_outputs(model)
returns a list of all thetf.Tensor
end-points as shown in the above
- Comparison of different networks:
inputs = tf.placeholder(tf.float32, [None, 224, 224, 3])
models = [
nets.MobileNet75(inputs),
nets.MobileNet100(inputs),
nets.SqueezeNet(inputs),
]
img = utils.load_img('cat.png', target_size=256, crop_size=224)
imgs = nets.preprocess(models, img)
with tf.Session() as sess:
nets.pretrained(models)
for (model, img) in zip(models, imgs):
preds = sess.run(model, {inputs: img})
print(utils.decode_predictions(preds, top=2)[0])
- Transfer learning:
inputs = tf.placeholder(tf.float32, [None, 224, 224, 3])
outputs = tf.placeholder(tf.float32, [None, 50])
model = nets.DenseNet169(inputs, is_training=True, classes=50)
loss = tf.losses.softmax_cross_entropy(outputs, model)
train = tf.train.AdamOptimizer(learning_rate=1e-5).minimize(loss)
with tf.Session() as sess:
nets.pretrained(model)
# for (x, y) in your NumPy data (the NHWC and one-hot format):
sess.run(train, {inputs: x, outputs: y})
- Using multi-GPU:
inputs = tf.placeholder(tf.float32, [None, 224, 224, 3])
models = []
with tf.device('gpu:0'):
models.append(nets.ResNeXt50(inputs))
with tf.device('gpu:1'):
models.append(nets.DenseNet201(inputs))
from tensornets.preprocess import fb_preprocess
img = utils.load_img('cat.png', target_size=256, crop_size=224)
img = fb_preprocess(img)
with tf.Session() as sess:
nets.pretrained(models)
preds = sess.run(models, {inputs: img})
for pred in preds:
print(utils.decode_predictions(pred, top=2)[0])
- The top-k errors were obtained with TensorNets on ImageNet validation set and may slightly differ from the original ones. The crop size is 224x224 for all but 331x331 for NASNetAlarge, 299x299 for Inception3,4,ResNet2, and ResNet50-152v2.
- Top-1: single center crop, top-1 error
- Top-5: single center crop, top-5 error
- 10-5: ten crops (1 center + 4 corners and those mirrored ones), top-5 error
- Size: rounded the number of parameters
- The computation times were measured on NVIDIA Tesla P100 (3584 cores, 16 GB global memory) with cuDNN 6.0 and CUDA 8.0.
- Speed: milliseconds for inferences of 100 images
Top-1 | Top-5 | 10-5 | Size | Speed | References | |
---|---|---|---|---|---|---|
ResNet50 | 25.126 | 7.982 | 6.842 | 26M | 195.4 | [paper] [tf-slim] [torch-fb] [caffe] [keras] |
ResNet101 | 23.580 | 7.214 | 6.092 | 45M | 311.7 | [paper] [tf-slim] [torch-fb] [caffe] |
ResNet152 | 23.396 | 6.882 | 5.908 | 60M | 439.1 | [paper] [tf-slim] [torch-fb] [caffe] |
ResNet50v2 | 24.526 | 7.252 | 6.012 | 26M | 209.7 | [paper] [tf-slim] [torch-fb] |
ResNet101v2 | 23.116 | 6.488 | 5.230 | 45M | 326.2 | [paper] [tf-slim] [torch-fb] |
ResNet152v2 | 22.236 | 6.080 | 4.960 | 60M | 455.2 | [paper] [tf-slim] [torch-fb] |
ResNet200v2 | 21.714 | 5.848 | 4.830 | 65M | 618.3 | [paper] [tf-slim] [torch-fb] |
ResNeXt50c32 | 22.260 | 6.190 | 5.410 | 25M | 267.4 | [paper] [torch-fb] |
ResNeXt101c32 | 21.270 | 5.706 | 4.842 | 44M | 427.9 | [paper] [torch-fb] |
ResNeXt101c64 | 20.506 | 5.408 | 4.564 | 84M | 877.8 | [paper] [torch-fb] |
WideResNet50 | 21.982 | 6.066 | 5.116 | 69M | 358.1 | [paper] [torch] |
Inception1 | 33.160 | 12.324 | 10.246 | 7.0M | 165.1 | [paper] [tf-slim] [caffe-zoo] |
Inception2 | 26.296 | 8.270 | 6.882 | 11M | 134.3 | [paper] [tf-slim] |
Inception3 | 22.102 | 6.280 | 5.038 | 24M | 314.6 | [paper] [tf-slim] [keras] |
Inception4 | 19.880 | 5.022 | 4.206 | 43M | 582.1 | [paper] [tf-slim] |
InceptionResNet2 | 19.744 | 4.748 | 3.962 | 56M | 656.8 | [paper] [tf-slim] |
NASNetAlarge | 17.502 | 3.996 | 3.412 | 94M | 2081 | [paper] [tf-slim] |
NASNetAmobile | 25.634 | 8.146 | 6.758 | 7.7M | 165.8 | [paper] [tf-slim] |
VGG16 | 28.732 | 9.950 | 8.834 | 138M | 348.4 | [paper] [keras] |
VGG19 | 28.744 | 10.012 | 8.774 | 144M | 399.8 | [paper] [keras] |
DenseNet121 | 25.480 | 8.022 | 6.842 | 8.1M | 202.9 | [paper] [torch] |
DenseNet169 | 23.926 | 6.892 | 6.140 | 14M | 219.1 | [paper] [torch] |
DenseNet201 | 22.936 | 6.542 | 5.724 | 20M | 272.0 | [paper] [torch] |
MobileNet25 | 48.418 | 24.208 | 21.196 | 0.48M | 29.27 | [paper] [tf-slim] |
MobileNet50 | 35.708 | 14.376 | 12.180 | 1.3M | 42.32 | [paper] [tf-slim] |
MobileNet75 | 31.588 | 11.758 | 9.878 | 2.6M | 57.23 | [paper] [tf-slim] |
MobileNet100 | 29.576 | 10.496 | 8.774 | 4.3M | 70.69 | [paper] [tf-slim] |
SqueezeNet | 45.566 | 21.960 | 18.578 | 1.2M | 71.43 | [paper] [caffe] |
- The object detection models can be coupled with any network but mAPs could be measured only for the models with pre-trained weights. Note that:
YOLOv2VOC
is equivalent toYOLOv2(inputs, Darknet19)
,TinyYOLOv2VOC
:TinyYOLOv2(inputs, TinyDarknet19)
,FasterRCNN_ZF_VOC
:FasterRCNN(inputs, ZF)
,FasterRCNN_VGG16_VOC
:FasterRCNN(inputs, VGG16, stem_out='conv5/3')
.
- The mAPs were obtained with TensorNets on PASCAL VOC2007 test set and may slightly differ from the original ones.
- The test input sizes were the numbers reported as the best in the papers:
YOLOv2
: 416x416FasterRCNN
: min_shorter_side=600, max_longer_side=1000
- The sizes stand for rounded the number of parameters.
- The computation times were measured on NVIDIA Tesla P100 (3584 cores, 16 GB global memory) with cuDNN 6.0 and CUDA 8.0.
- Speed: milliseconds only for network inferences of a 416x416 single image
- FPS: 1000 / speed
mAP | Size | Speed | FPS | References | |
---|---|---|---|---|---|
YOLOv2VOC | 0.7320 | 51M | 14.75 | 67.80 | [paper] [darknet] [darkflow] |
TinyYOLOv2VOC | 0.5303 | 16M | 6.534 | 153.0 | [paper] [darknet] [darkflow] |
FasterRCNN_ZF_VOC | 0.4466 | 59M | 241.4 | 3.325 | [paper] [caffe] [roi-pooling] |
FasterRCNN_VGG16_VOC | 0.6872 | 137M | 300.7 | 4.143 | [paper] [caffe] [roi-pooling] |