How does PyTorch module do the back prop

Tags:

While following the instructions on extending PyTorch - adding a module, I noticed while extending Module, we don't really have to implement the backward function. The only thing we need is to apply the Function instance in the forward function and PyTorch can automatically call the backward one in the Function instance when doing the back prop. This seems like magic to me as we didn't even register the Function instance we used. I looked into the source code but didn't find anything related. Could anyone kindly point me out a place that all those actually happened?

567

asked Apr 01 '18 04:04

NoSegfault

1 Answers

Not having to implement backward() is the reason PyTorch or any other DL framework is so valuable. In fact, implementing backward() should only be done in very specific cases where you need to mess with the network's gradient (or when you create a custom Function that can't be expressed using PyTorch's built-in functions).

PyTorch computes backward gradients using a computational graph which keeps track of what operations have been done during your forward pass. Any operation done on a Variable implicitly get registered here. Then it's a matter of traversing the graph backward from the variable where it was called, and applying derivative chain rule to compute the gradients.

PyTorch's About page has a nice visualization of the graph and how it generally works. I'd also recommend looking up compute graphs and autograd mechanism on Google if you want more details.

EDIT: The source code where all this happens would be in the C part of PyTorch's codebase, where the actual graph is implemented. After some digging around, I found this:

/// Evaluates the function on the given inputs and returns the result of the
/// function call.
variable_list operator()(const variable_list& inputs) {
    profiler::RecordFunction rec(this);
    if (jit::tracer::isTracingVar(inputs)) {
        return traced_apply(inputs);
    }
    return apply(inputs);
}

So in each Function, PyTorch first checks if its inputs needs tracing, and performs trace_apply() as implemented here. You can see the node being created and appended to the graph:

// Insert a CppOp in the trace.
auto& graph = state->graph;
std::vector<VariableFlags> var_flags;
for(auto & input: inputs) {
    var_flags.push_back(VariableFlags::of(input));
}
auto* this_node = graph->createCppOp(get_shared_ptr(), std::move(var_flags));
// ...
for (auto& input: inputs) {
    this_node->addInput(tracer::getValueTrace(state, input));
}
graph->appendNode(this_node);

My best guess here is that every Function object registers itself and its inputs (if needed) upon execution. Every non-functional calls (eg. variable.dot()) simply defers to the corresponding function, so this still applies.

NOTE: I don't take part in PyTorch's development and is in no way an expert on its architecture. Any corrections or addition would be welcomed.

138

answered Nov 01 '22 00:11

Mach_Zero

Related questions
                            
                                Remove empty entries during string split
                            
                                Python find numbers between range in list or array
                            
                                batch_input_shape tuple on Keras LSTM
                            
                                How to read the text from the alert box using Python + Selenium
                            
                                Forcing dict keys to be used as argument specifiers with str.format
                            
                                Why do scipy and numpy fft plots look different?
                            
                                Python Logging: Change "WARN" to "INFO"
                            
                                how to create a dataframe from a table in a word document (.docx) file using pandas
                            
                                Selecting Random Windows from Multidimensional Numpy Array Rows
                            
                                Vectorized 2-D moving window in numpy including edges
                            
                                LSTM Initial state from Dense layer
                            
                                Python - How can I completely uninstall Anaconda on Windows 10?
                            
                                Django UpdateView, get the current object being edit id?
                            
                                Keras : AttributeError: 'int' object has no attribute 'ndim' when using model.fit
                            
                                Convert varargin and nargin to from Matlab to Python
                            
                                Python: Possible to unpack tuple and append to multiple lists in one line?
                            
                                How to use Exponential Moving Average in Tensorflow
                            
                                AWS Lambda: call function from another AWS lambda using boto3 invoke
                            
                                POST document with Django RequestFactory instead of form data
                            
                                Transform a datetime column to YYYYQx with quarter number

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does PyTorch module do the back prop

Tags:

python

python-3.x

metaprogramming

pytorch

NoSegfault

People also ask

1 Answers

Mach_Zero

Recent Activity

Donate For Us