I test the function of CUFFT_R2C and CUFFT_C2R. Both of them appeare to be unstable. By increasing the output array size, CUFFT_R2C may work properly. However, CUFFT_C2R nearly never works unless an exact spectrum calculated by CUFFT_R2C is used. The test suggests these two functions are terrible. Instead, CUFFT_C2C is much much more stable.
I employed a random matrix, denoted by $x$, which is of dimension Nx-by-Ny, to test both functions. As mentioned in cufft documentary, the complex-valued data should be of size (Nx/2+1)-by-Ny. The suggested dimension only worked for Nx and Ny <128. When Nx=Ny=256. The image spectrum contains conspicuous artifacts shown as follows. For Nx=Ny=1024, the results are totally garbage. Though not shown here, the spectrum calculated by use of CUFFT_C2C are very accurate in comparison to the ground truth calculated by use of Matlab.



Left: Testing random image ($x$,256-by-256 pixels); Middle: A quarter of the image spectrum calculated by use of CUFFT_R2C; Right: Errors of the middle image in comparison to ground truth calculated by use of Matlab.
Note that the above error CAN be corrected by increasing the size of the complex-valued data from (Nx/2+1)Ny to NxNy. According to cufft documentary, it suggests that “Pointers to idata and odata are both required to be aligned to cufftComplex data type in single-precision transforms.” I have no idea how to ALIGN the data to a data type. It may imply that the complex-valued data should be of dimension Nx*Ny. (I am non-native speaker. I am not going to guess the implications.)
The function CUFFT_C2R is trickier. Unless I used the image spectrum (of dimensions Nx-by-Ny) calculated by CUFFT_R2C, I always got garbage. It suggests that not only the first (Nx/2+1)Ny elements, but all NxNy elements affect the results.
In summary, it took me whole weekend. I got little help from either cufft documentary or the internet. I will not use CUFFT_C2R or CUFFT_R2C. If anybody knows the trick, please let me know. I am concluding here the functions of CUFFT_R2C and CUFFT_C2R are TERRIBLE.
Sometimes, even CUFFT_C2C is not stable, in which case CUFFT_Z2Z is needed.
CUFFT_INVERSE is not reliable as well. I need to call CUFFT 3 times to do the inverse (with a scaling factor)