Using Vector Loads

CUDA actually has [vector loads/stores](https://developer.nvidia.com/blog/cuda-pro-tip-increase-performance-with-vectorized-memory-access/), this reduces the number of instructions (3 to 1), which helps out with the memory latency issues we've been seeing in our functions (based on profiling). 
Since NumPy defaults to row-major ordering, it makes our data handling a lot easier. I no longer have to convert everything to column-major format. This change also simplifies slicing grid points, which is a nice bonus for improving efficiency in our calculations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using Vector Loads #9

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Using Vector Loads #9

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions