Reduce backward memory to ~half by jasam-sheja · Pull Request #218 · mapillary/inplace_abn

jasam-sheja · 2021-12-10T13:26:38Z

Reuse the grad and input tensors in the backward pass instead of creating new ones.
Mainly reuse y_act for xhat and dy_act for dy.
Ensure every function support in-place operation. (Elu is modified accordingly)
Ensure the tensors allow in-place operation (dy_act has to be contiguous)

Needs more testing. However, there are no unit tests.

- reuse y_act_ and dy_act_ - use inplace calculations in `forward_cpu` and `backward_cpu`

- make sure dy_act doesn't have memory overlaping - reflect the inplace operations in the doc and comments

Copilot

Pull request overview

This PR reduces memory usage during the backward pass of InPlaceABN by reusing existing activation/gradient buffers (overwriting y_act with xhat and dy_act with dy) and adjusting code paths to support in-place behavior (including an ELU backward tweak and ensuring dy_act is contiguous).

Changes:

Reuse y_act/dy_act as xhat/dy in backward-reduce (CPU/CUDA) to cut temporary allocations.
Switch several CPU forward/backward intermediate computations to in-place ops to reduce transient allocations.
Update Python/C++ binding notes and make dy_act contiguous to support in-place writes.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/inplace_abn_cuda.cu`	Reuses `y_act`/`dy_act` buffers for backward outputs in CUDA implementation.
`src/inplace_abn_cpu.cpp`	Reuses `y_act_`/`dy_act_` buffers and increases in-place usage for CPU forward/backward.
`src/inplace_abn.cpp`	Updates pybind docstring to mention in-place behavior for `backward_reduce`.
`inplace_abn/functions.py`	Makes `dy_act` contiguous and notes that backward-reduce overwrites tensors in-place.
`include/inplace_abn.h`	Adjusts ELU backward ordering to support in-place overwrite safely.

Comments suppressed due to low confidence (1)

src/inplace_abn_cuda.cu:156

y_act/dy_act are being reused as xhat/dy, but the CUDA kernel is launched with at::RestrictPtrTraits for both input and output accessors. When y_act_accessor aliases xhat_accessor (and dy_act_accessor aliases dy_accessor), this violates the __restrict__ aliasing assumption and can lead to miscompilation/incorrect results. To support in-place reuse safely, use non-restrict pointer traits (e.g., at::DefaultPtrTraits) for these accessors / kernel params, or keep separate output tensors (or provide a separate non-restrict kernel for the in-place path).

  auto &xhat = y_act; // reuse
  auto &dy = dy_act; // reuse
  auto sum_dy = at::empty({chn}, acc_options);
  auto sum_xhat_dy = at::empty({chn}, acc_options);

  // Make accessors
  auto y_act_accessor = y_act.packed_accessor<scalar_t, 3, at::RestrictPtrTraits, index_t>();
  auto dy_act_accessor = dy_act.packed_accessor<scalar_t, 3, at::RestrictPtrTraits, index_t>();
  auto xhat_accessor = xhat.packed_accessor<scalar_t, 3, at::RestrictPtrTraits, index_t>();
  auto dy_accessor = dy.packed_accessor<scalar_t, 3, at::RestrictPtrTraits, index_t>();
  auto weight_accessor = packed_accessor_or_dummy<prmscalar_t, 1, at::RestrictPtrTraits, index_t>(weight);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> backward_reduce_impl(
    const at::Tensor& y_act_,
    const at::Tensor& dy_act_,
    const std::optional<at::Tensor>& weight_,
    const std::optional<at::Tensor>& bias_,
    float eps,
    float activation_param) {
  // Initialize output tensors
-  auto xhat_ = at::empty_like(y_act_);
-  auto dy_ = at::empty_like(y_act_);
+  auto &xhat_ = y_act_; // reuse
+  auto &dy_ = dy_act_;  // reuse
  auto sum_dy_ = at::zeros({y_act_.size(1)}, y_act_.options());


-
        # Call backward_reduce if we need to compute at least one of the gradients
        if any(ctx.needs_input_grad):
+            # remove memory overlaping to allow for in-place operation



  // Backward methods
-  m.def("backward_reduce", &backward_reduce, "First step of the backward pass");
+  m.def("backward_reduce", &backward_reduce, "First step of the backward pass. This is an in-place operation w.r.t. y_act, dy_act,");


        y_act, var, count, weight, bias = ctx.saved_tensors
-
        # Call backward_reduce if we need to compute at least one of the gradients
        if any(ctx.needs_input_grad):
+            # remove memory overlaping to allow for in-place operation
+            dy_act = dy_act.contiguous()
+            # This overwrites y_act with xhat and dy_act with dy
            xhat, dy, sum_dy_local, sum_xhat_dy_local = _backend.backward_reduce(
                y_act,
                dy_act,


jasam-sheja added 4 commits December 10, 2021 22:02

various inplace ops on the CPU

e31ca6d

- reuse y_act_ and dy_act_ - use inplace calculations in `forward_cpu` and `backward_cpu`

reuse y_act_ and dy_act_ in inplace_ab_cuda.cu

2ffc9cb

modify Activation::ELU::backward to allow inplace

29701c9

python side compatibility

49446d7

- make sure dy_act doesn't have memory overlaping - reflect the inplace operations in the doc and comments

facebook-github-bot added the cla signed label Dec 10, 2021

Merge branch 'mapillary:main' into inplace_opt

acce20b

Copilot AI review requested due to automatic review settings March 18, 2026 08:23

Copilot started reviewing on behalf of jasam-sheja March 18, 2026 08:24 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce backward memory to ~half#218

Reduce backward memory to ~half#218
jasam-sheja wants to merge 5 commits intomapillary:mainfrom
jasam-sheja:inplace_opt

jasam-sheja commented Dec 10, 2021 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jasam-sheja commented Dec 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jasam-sheja commented Dec 10, 2021 •

edited

Loading