How to transpose a matrix in an optimal way using blas?

1 Answers

BLAS doesn't have a matrix transpose routine built in. The CUDA SDK includes a matrix transpose example with a paper which discusses optimal strategy for performing a transpose. Your best strategy is probably to use row major inputs to CUBLAS with the transpose input version of the calls, then perform the intermediate calculations in column major, and lastly perform a transpose operation afterwards using the SDK transpose kernel.

Edited to add that CUBLAS added a transpose routine in CUBLAS version 5, geam, which can performed matrix transposition in GPU memory and should be regarded as optimal for whatever architecture you are using.

178

answered Sep 16 '22 21:09

talonmies

Related questions
                            
                                Warning: declaration of '...' will not be visible outside this function [-Wvisibility]
                            
                                C's equality operator on converted pointers
                            
                                Calling Rust from Swift
                            
                                How to determine a process "virtual size" (WinXP)?
                            
                                matlab in C C++ and C C++ in matlab [closed]
                            
                                What are the possible pitfalls in porting Psyco to 64-bit?
                            
                                Determine pid of terminated process
                            
                                Using many mutex locks
                            
                                Objective-C / C giving enums default values
                            
                                How does this code calculate the number of CPU cycles elapsed?
                            
                                Using getchar() on c gets the 'Enter' after input [duplicate]
                            
                                How to access argv[] from outside the main() function?
                            
                                Some questions about a single-instance array in typedef
                            
                                Non-extern function with C linkage
                            
                                Difference in MultiThread aspect between Java and C/C++
                            
                                Does Linux allow process group ids to be reassigned to processes?
                            
                                The Length of envp in C
                            
                                Android NDK timers
                            
                                Mixing OpenMP with pthreads
                            
                                In an array with integers one value is in the array twice. How do you determine which one?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to transpose a matrix in an optimal way using blas?

Tags:

c

cuda

blas

cublas

Martin Kristiansen

People also ask

1 Answers

talonmies

Recent Activity

Donate For Us