Conversation
…llocator behavior of the remote_stream CUDA impl which may differ from the ROCm impl.
|
Just a nudge on this @GMNGeoffrey and @gcapodagAMD |
GMNGeoffrey
left a comment
There was a problem hiding this comment.
Sorry, I missed this before
| # Install basic tools | ||
| RUN apt-get update \ | ||
| && apt install -y clang-format-15 ninja-build tree | ||
| && apt install -y clang-format-15 git ninja-build tree |
There was a problem hiding this comment.
Strangely enough, git was missing and installing the graphbolt deps would crash because hipcollections would pull in it's own deps like libhipcxx with git and crash. I'm not sure why this wasn't popping up before. Maybe I didn't catch it with #16?
| # g3 will reuse g2's memory block, resulting a wrong result | ||
| g3 = rand_graph(10, 20, device=F.ctx()) | ||
| g3.create_formats_() |
There was a problem hiding this comment.
Yeah this looks like it's verifying that a bug exists or something... I agree it's probably not worth trying to replicate it, although it's not clear why it wasn't failing before...
There was a problem hiding this comment.
Agreed about it not being clear. Something smells fishy, but I think it's small enough that we can skip
Description
I noticed in some local testing that the following tests were failing:
test_OnDiskDataset_heterogeneous[csv-True-True]test_OnDiskDataset_heterogeneous[csv-True-False]test_OnDiskDataset_heterogeneous[csv-False-True]test_OnDiskDataset_heterogeneous[csv-False-False]test_OnDiskDataset_preprocess_graph_with_single_typetest_record_stream_graph_negativeDetails
The
OnDiskDatasetissues are caused by behavior in the pandas > 3 releases (which explains why they didn't show up in the last release of DGL. You can't modify values for certain things in place, so we just have to tweak two lines:The
test_record_stream_graph_negativetest, I'm less sure about and it seems like it's relying on the underlying allocator to reuse blocks of memory, which doesn't seem to happen for us in my local testing. I'm testing with the same dependencies we supported in the last release so I'm not sure that anything changed on the rocm side. It seems like a potentially flaky test, and I'm not sure it's worth our time to dig into. @GMNGeoffrey what do you think?Test results
Quick summary of the test results:
Checklist
Please feel free to remove inapplicable items for your PR.