Summary on Supporting PyTorch

Workflow description about ElasticDL

About start command:

elasticdl train --image_name=elasticdl:mnist: tutorials/elasticdl_local.md

setup entry: elasticdl=elasticdl_client.main:main

Create master:

elasticdl_client/api.py

Run worker:

run task (training/evaluation/prediction).Only calculate the gradient and report gradient to ps.

worker/mian.py

elastic/python/worker/worker.py

PS Client:

Push parameters to PS:elastic/python/worker/ps_client.py

How to print gradient information directly in PyTorch.

Usually, we train in PyTorch with an optimizer.

# training and testing
for epoch in range(EPOCH):
    for step, (b_x, b_y) in enumerate(train_loader):   # gives batch data, normalize x when iterate train_loader

        output = cnn(b_x)[0]            # cnn output
        loss = loss_func(output, b_y)   # cross entropy loss
        optimizer.zero_grad()           # clear gradients for this training step
        loss.backward()                 # backpropagation, compute gradients
        optimizer.step()                # apply gradients

In elacticdl framework, worker nodes pull datasets from master and compute gradients without apply gradients. At the same time, the parameter servers provide parameters to the worker. Worker nodes need to send gradients information back to the ps rather than process it themselves. So the gradients information needs to be displayed.

Calculated the gradient for the variables of non-leaf nodes

a→b→c→d
    ↓
    e

Generally, only the gradient of leaf nodes is calculated, and for the leaf nodes c and node b would not have explicitly to keep with the gradient in the process of calculation (because in general only need to update the leaf nodes), can save a large part of the memory, but in the process of debugging, sometimes we need to monitor the intermediate variable gradient, in order to ensure the effectiveness of the network.

Two methods to print out the gradient of the leaf node: Tensor.retain_grad() and hook.

Tensor.retain_grad() saves the gradient of non-leaf nodes, the cost is to increase the consumption of video memory, while the hook function method is to print directly during the reverse calculation, so it will not increase the consumption of video memory, the usage of retain_grad() is more convenient than the hook function.

# Tensor.retain_grad
x = Variable(torch.ones(2, 2), requires_grad=True)
y = x + 2
y.retain_grad()
z = y * y * 3
out = z.mean()
out.backward()
print(y.grad)

> tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

# hook
grads = {}
def save_grad(name):
    def hook(grad):
        grads[name] = grad
    return hook

x = Variable(torch.randn(1,1), requires_grad=True)
y = 3*x
z = y**2

# In here, save_grad('y') returns a hook (a function) that keeps 'y' as name
y.register_hook(save_grad('y'))
z.register_hook(save_grad('z'))
z.backward()

print(grads['y'])
print(grads['z'])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly