How to Get Properties of your CUDA Device?

 

If you’re a going into High Performance Computing (HPC), there’s a very rare chance to miss the name CUDA. I’m just getting into that. nVdia GeForce 8 Series + hardware supports CUDA programming. You might be interested in various parameters of CUDA. From the version to number of multi processors, size of shared memory, global memory etc… How you can get the details of your CUDA hardware? Interestingly you can have multiple GPU in your machine. So how can we enumerate the properties of your CUDA hardware? Here’s the sample CUDA snippet for you. Save in your disk and compile it using nVidia CUDA compiler (nvcc.exe coming with CUDA Toolkit)

[sourcecode language='cpp']
Device properties of CUDA Hardware
#include
#include
#include

void DisplayProperties( cudaDeviceProp* pDeviceProp )
{
if( !pDeviceProp )
return;

printf( “\nDevice Name \t – %s “, pDeviceProp->name );
printf( “\n**************************************”);
printf( “\nTotal Global Memory\t\t -%d KB”, pDeviceProp->totalGlobalMem/1024 );
printf( “\nShared memory available per block \t – %d KB”, pDeviceProp->sharedMemPerBlock/1024 );
printf( “\nNumber of registers per thread block \t – %d”, pDeviceProp->regsPerBlock );
printf( “\nWarp size in threads \t – %d”, pDeviceProp->warpSize );
printf( “\nMemory Pitch \t – %d bytes”, pDeviceProp->memPitch );
printf( “\nMaximum threads per block \t – %d”, pDeviceProp->maxThreadsPerBlock );
printf( “\nMaximum Thread Dimension (block) \t – %d %d %d”, pDeviceProp->maxThreadsDim[0], pDeviceProp->maxThreadsDim[1], pDeviceProp->maxThreadsDim[2] );
printf( “\nMaximum Thread Dimension (grid) \t – %d %d %d”, pDeviceProp->maxGridSize[0], pDeviceProp->maxGridSize[1], pDeviceProp->maxGridSize[2] );
printf( “\nTotal constant memory \t – %d bytes”, pDeviceProp->totalConstMem );
printf( “\nCUDA ver \t – %d.%d”, pDeviceProp->major, pDeviceProp->minor );
printf( “\nClock rate \t – %d KHz”, pDeviceProp->clockRate );
printf( “\nTexture Alignment \t – %d bytes”, pDeviceProp->textureAlignment );
printf( “\nDevice Overlap \t – %s”, pDeviceProp-> deviceOverlap?”Allowed”:”Not Allowed” );
printf( “\nNumber of Multi processors \t – %d”, pDeviceProp->multiProcessorCount );
}

int main(void)
{
cudaDeviceProp deviceProp;
int nDevCount = 0;

cudaGetDeviceCount( &nDevCount );
printf( “Total Device found: %d”, nDevCount );
for (int nDeviceIdx = 0; nDeviceIdx < nDevCount; ++nDeviceIdx )
{
memset( &deviceProp, 0, sizeof(deviceProp));
if( cudaSuccess == cudaGetDeviceProperties(&deviceProp, nDeviceIdx))
DisplayProperties( &deviceProp );
else
printf( "\n%s", cudaGetErrorString(cudaGetLastError()));
}
}
[/sourcecode]

Technorati Tags: ,,,

 

Sharing my thoughts...

 

When and How should we use MsgWaitForMultipleObjects?

 

Most of us have used WaitForSingleObject and WaitForMultipleObjects in our code. Well, what we can do more with MsgWaitForMultipleObjects (there’s no API called MsgWaitForSingleObject. You will understand the reason soon).

MsgWaitForMultipleObjects wait for a wait-able kernal object and for the input messages in the current thread.

You may find it very useful in some situations. What if we can do the window painting, process the timer message even the UI thread in the wait state. Pretty cool no? Even I don’t know the under the hood of “Add Remove Programs” application, I believe that the application waits till the current un-install program finishes its execution and during this time, it’s even allows proper painting of its windows components.

The function takes the following shape.

[sourcecode language='cpp']
DWORD WINAPI MsgWaitForMultipleObjects(
__in DWORD nCount,
__in const HANDLE *pHandles,
__in BOOL bWaitAll,
__in DWORD dwMilliseconds,
__in DWORD dwWakeMask
);
[/sourcecode]

You can get detailed information and various parameters from the MSDN documentation of the API. Here I’m describing only the important ones.

You can pass multiple wait-able handles as an array to the function by specifying the size of the array. The maximum number of objects is MAXIMUM_WAIT_OBJECTS – 1 because the last wait object is used to indicate the availability of message by getting out from the wait function. If the function returned WAIT_OBJECT_0 + nCount, then it indicates that the new input of the type specified in dwWakeMask is arrived. You can get detailed information on the return value from MSDN documentation.

The next important parameter is dwWakMask. It indicates the possible type of messages we wish to wait upon.

Read further after checking the code OK? I want to convey few caveats of this API. It will be too early to describe here. Here’s the sample demonstration of the API

[sourcecode language='cpp']
HANDLE hQuit = 0; // Mutex handle
// Message definition
DWORD DO_PROCESSING_MSG = 5555;

DWORD WINAPI ThreadFxn( LPVOID )
{
MSG msg;
while( true )
{
// Wait for mutex and all input events
DWORD dwResult = MsgWaitForMultipleObjects( 1,
&hQuit,
FALSE,
INFINITE,
QS_ALLEVENTS );

// if message has been received
if( dwResult == WAIT_OBJECT_0+1 )
{
// Check if message arrived or not
if(PeekMessage(&msg, (HWND)-1, 0, 0, PM_REMOVE))
{
// Translate and dispatch message
TranslateMessage(&msg);
DispatchMessage(&msg);
}
printf (” \n New Message Receieved ” );
}
else // it’s time to quit
{
printf( “\nbye bye” );
break;
}
}
return 0;
}

int _tmain(int argc, TCHAR* argv[])
{
int nRetCode = 0;
// Create Mutex
hQuit = CreateMutex( 0, TRUE, 0 );

DWORD dwTid = 0;
// Create new thread
HANDLE hThread = CreateThread(0,0,ThreadFxn,0,0,&dwTid);

// Post some message
for ( int i = 0 ; i < 10; i ++ )
{
// Post message to thread's queue
PostThreadMessage( dwTid, DO_PROCESSING_MSG, 0, 0 );
Sleep( 500 );
}

// Signal the mutex to quit
ReleaseMutex( hQuit );
// Wait till the thread exits
WaitForSingleObject( hThread,INFINITE);
CloseHandle( hQuit );
return nRetCode;
}
[/sourcecode]

The tricky bWaitAll parameter.

If you check the bWaitAll parameter’s document in MSDN, you can find it is described same as WaitForMultiplObjects API. In the sample code above, we’re waiting for one kernel object and the thread messages we’re interested.

What happens if we wait as follows ?

[sourcecode language='cpp']
DWORD dwResult = MsgWaitForMultipleObjects( 1,
&hQuit,
TRUE,
INFINITE,
QS_ALLEVENTS );
[/sourcecode]

We’re passing bWaitAll parameter as true instead of FALSE. Which means that the function returns all the objects we’re waiting upon should be signaled. Here we’re passing only kernel object and even if you call “ReleaseMutex”, the thread will not be signaled until it receives a message in the thread’s queue. i.e the function actually waits for kernel objects + one more message in the thread’s queue.

Raymond chen explains about another caveat with MsgWaitForMultipleObjects and the queue state in his blog. I will explain the issue here in short.

PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE) returns TRUE indicating that there is a message. Instead of processing the message, you ignore it and call MsgWaitForMultipleObjects.

This wait will not return immediately, even though there is a message in the queue. That’s because the call to PeekMessage told you that a message was ready, and you willfully ignored it. The MsgWaitForMultipleObjects message tells you only when there are new messages; any message that you already knew about doesn’t count. The MsgWaitForMultipleObjectsEx function lets you pass the MWMO_INPUTAVAILABLE flag to indicate that it should check for previously-ignored input. Seems It’s enough read more details from Raymond’s blog.

Sidebar: The windows CE documentation saying it’s possible to wait for CriticalSection object with MsgWaitForMultipleObjects. I believe that could be their typo or something. But I never succeeds to call this API with Critical Section objects. Please let me know if someone could use it with Critical Section objects. 

Technorati Tags: ,,,

 


Sharing my thoughts...

 

MFC way to tokenize string

 

In the C-Runtime library, we’ve lot of functions to extract substring based on delimiters. The functions like strtok, strchr are really handy to accomplish tokenizing the strings. In MFC, the most widely used class is CString for string manipulation. We’ve to use the combination of CString::Find, CString::Right, CString::Left or CString:Mid function to extract the substring from a CString object. Either we’ve to implement this find, and extract logic or we’ve to rely upon the TCHAR version (CString holds ANSI/Unicode strings according to the character type defined for the application) of CRT functions to tokenize CString Object.

MSDN now put the undocumented string extraction routine into limelight with the release of Visual Studio 2005. AfxExtractSubString (global MFC function) provides a simple way to extract the substring from a CString object (or from a lengthy character constant). The function actually relies upon the CRT functions to extract the substring but still it’s handy by reducing the effort of developer by simply employing this routine.

The function takes the following form.

[sourcecode language='cpp']
BOOL AFXAPI AfxExtractSubString (
CString& rString, //Return string
LPCTSTR lpszFullString, // Input string
int iSubString, // index of the substring to extract
TCHAR chSep = ‘\n’ // Delimiter character
);
[/sourcecode]

The default delimiter is “\n”. Here’s a simple demostration of AfxExtractSubString

[sourcecode language='cpp']
// Input string
LPCTSTR lpszSource = _T( “quick brown fox jumps over the lazy dog”);
CString strExtracted; // an individual name, value pair
int i = 0; // Begin of index
while (AfxExtractSubString(strExtracted, lpszSource, i, _T( ‘ ‘ )))
{
_tcprintf( _T(“%s\n”),strExtracted);
// Advance to the next substring
i++;
}
[/sourcecode]

The function is not still powerful as strtok where, strktok supports multiple delimiters. The AfxExtractSubString supports only single character as delimiter. Obviously You may have to write multiple calls to the function with different delimiters to support multiple delimiters. If you debug in, you can find that the function is implemented using strchr function :) . This function will work fine with Visual Studio 6.0 but it got documented only with the VS 2005 only. Hope this helps!

Sharing my thoughts...