[WIP] initial curand implementation for model init #741

ngc92 · 2024-08-13T11:01:07Z

as an alternative to the multi-threaded model init, this uses curand to generate initial weights directly on the GPU.
It is still work-in-progress, needs error-checking, and I dislike the cudamalloc inside the loop, but it should be enough to measure how much speed-up we would get.

ngc92 · 2024-08-14T22:32:41Z

cudaMemset works on the level of individual bytes; we want to set floats or bf16s here; for zero, conveniently repeating the bit-pattern 0 also implies the number zero, so you can use it to initialize floats to 0.0; but unless your desired float has the same bits in every byte, it isn't going to work for other values.

initial curand implementation for model init

29a6b1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] initial curand implementation for model init #741

[WIP] initial curand implementation for model init #741

ngc92 commented Aug 13, 2024

ngc92 commented Aug 14, 2024

[WIP] initial curand implementation for model init #741

Are you sure you want to change the base?

[WIP] initial curand implementation for model init #741

Conversation

ngc92 commented Aug 13, 2024

ngc92 commented Aug 14, 2024