-
Notifications
You must be signed in to change notification settings - Fork 115
Summary on Supporting PyTorch
elasticdl train --image_name=elasticdl:mnist: tutorials/elasticdl_local.md
setup entry: elasticdl=elasticdl_client.main:main
run task (training/evaluation/prediction).Only calculate the gradient and report gradient to ps.
elastic/python/worker/worker.py
Push parameters to PS:elastic/python/worker/ps_client.py
Usually, we train in PyTorch with an optimizer
.
# training and testing
for epoch in range(EPOCH):
for step, (b_x, b_y) in enumerate(train_loader): # gives batch data, normalize x when iterate train_loader
output = cnn(b_x)[0] # cnn output
loss = loss_func(output, b_y) # cross entropy loss
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
In elacticdl framework, worker nodes pull datasets from master and compute gradients without apply gradients. At the same time, the parameter servers provide parameters to the worker. Worker nodes need to send gradients information back to the ps rather than process it themselves. So the gradients information needs to be displayed.
a→b→c→d
↓
e
Generally, only the gradient of leaf nodes is calculated, and for the leaf nodes c and node b would not have explicitly to keep with the gradient in the process of calculation (because in general only need to update the leaf nodes), can save a large part of the memory, but in the process of debugging, sometimes we need to monitor the intermediate variable gradient, in order to ensure the effectiveness of the network.
Two methods to print out the gradient of the leaf node: Tensor.retain_grad()
and hook
.
Tensor.retain_grad()
saves the gradient of non-leaf nodes, the cost is to increase the consumption of video memory, while the hook
function method is to print directly during the reverse calculation, so it will not increase the consumption of video memory, the usage of retain_grad()
is more convenient than the hook
function.
# Tensor.retain_grad
x = Variable(torch.ones(2, 2), requires_grad=True)
y = x + 2
y.retain_grad()
z = y * y * 3
out = z.mean()
out.backward()
print(y.grad)
> tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])
# hook
grads = {}
def save_grad(name):
def hook(grad):
grads[name] = grad
return hook
x = Variable(torch.randn(1,1), requires_grad=True)
y = 3*x
z = y**2
# In here, save_grad('y') returns a hook (a function) that keeps 'y' as name
y.register_hook(save_grad('y'))
z.register_hook(save_grad('z'))
z.backward()
print(grads['y'])
print(grads['z'])