CUPTI Finalization and Cleanup Tutorial
The GitHub repo and complete tutorial is available at https://github.com/eunomia-bpf/cupti-tutorial.
Introduction
The CUPTI Finalization sample demonstrates proper cleanup procedures and finalization techniques for CUPTI-based profiling applications. This tutorial covers best practices for resource management, graceful shutdown procedures, and ensuring complete data collection before application termination.
What You'll Learn
- How to properly finalize CUPTI profiling sessions
- Understanding resource cleanup requirements
- Implementing graceful shutdown procedures
- Ensuring complete data collection and reporting
- Handling edge cases and error conditions during finalization
Understanding CUPTI Finalization
Proper CUPTI finalization is crucial for:
- Data Integrity: Ensuring all collected data is properly flushed and saved
- Resource Cleanup: Releasing CUPTI resources and avoiding memory leaks
- Graceful Shutdown: Handling application termination without data loss
- Error Handling: Managing cleanup in error conditions
- Performance Impact: Minimizing finalization overhead
Key Finalization Steps
Activity Buffer Finalization
- Flush remaining activity records
- Process pending activities
- Close activity streams
Event Group Cleanup
- Disable active event groups
- Destroy event group objects
- Release associated resources
Callback Deregistration
- Unsubscribe from callbacks
- Clean up callback data structures
- Ensure no pending callbacks
Context and Device Cleanup
- Destroy profiling contexts
- Release device resources
- Clean up per-device data structures
Building the Sample
Prerequisites
- CUDA Toolkit with CUPTI
- Application with CUPTI profiling integration
- Understanding of CUPTI resource management
Build Process
This creates the cupti_finalize
executable demonstrating proper CUPTI cleanup procedures.
Running the Sample
Basic Execution
Sample Output
=== CUPTI Finalization Process ===
Starting profiling session...
Profiling active for 1500ms
Beginning finalization process...
Activity Buffer Finalization:
Flushing activity buffers...
Processed 2,847 activity records
Activity buffer finalization complete
Event Group Cleanup:
Disabling 3 active event groups
Destroying event group objects
Event cleanup complete
Callback Deregistration:
Unsubscribing from 5 callback domains
Cleaning up callback data structures
Callback cleanup complete
Context and Device Cleanup:
Cleaning up 2 device contexts
Releasing profiling resources
Context cleanup complete
Resource Validation:
Memory leaks: 0
Active handles: 0
Pending operations: 0
Finalization completed successfully in 45ms
All data saved to: profiling_results.json
=== Application Termination ===
Code Architecture
Finalization Manager
class CUPTIFinalizationManager {
private:
struct ResourceTracker {
std::vector<CUpti_EventGroup> activeEventGroups;
std::vector<CUpti_SubscriberHandle> activeCallbacks;
std::vector<CUcontext> managedContexts;
std::vector<CUpti_ActivityBufferState> activityBuffers;
bool isFinalized;
};
ResourceTracker resources;
std::mutex finalizationMutex;
std::atomic<bool> finalizationInProgress;
public:
void registerEventGroup(CUpti_EventGroup group);
void registerCallback(CUpti_SubscriberHandle subscriber);
void registerContext(CUcontext context);
void beginFinalization();
void finalizeActivities();
void finalizeEventGroups();
void finalizeCallbacks();
void finalizeContexts();
bool validateCleanup();
};
Activity Buffer Finalization
void CUPTIFinalizationManager::finalizeActivities() {
std::cout << "Activity Buffer Finalization:" << std::endl;
// Force flush of all pending activity records
CUPTI_CALL(cuptiActivityFlushAll(CUPTI_ACTIVITY_FLAG_FLUSH_FORCED));
// Process any remaining buffered activities
CUpti_Activity* record = nullptr;
size_t processedRecords = 0;
do {
CUptiResult status = cuptiActivityGetNextRecord(buffer, validSize, &record);
if (status == CUPTI_SUCCESS) {
processActivityRecord(record);
processedRecords++;
} else if (status == CUPTI_ERROR_MAX_LIMIT_REACHED) {
break;
} else {
CUPTI_ERROR_CHECK(status);
}
} while (record != nullptr);
std::cout << " Processed " << processedRecords << " activity records" << std::endl;
// Disable activity recording
CUPTI_CALL(cuptiActivityDisable(CUPTI_ACTIVITY_KIND_KERNEL));
CUPTI_CALL(cuptiActivityDisable(CUPTI_ACTIVITY_KIND_MEMCPY));
CUPTI_CALL(cuptiActivityDisable(CUPTI_ACTIVITY_KIND_OVERHEAD));
std::cout << " Activity buffer finalization complete" << std::endl;
}
Event Group Cleanup
void CUPTIFinalizationManager::finalizeEventGroups() {
std::cout << "Event Group Cleanup:" << std::endl;
std::cout << " Disabling " << resources.activeEventGroups.size() << " active event groups" << std::endl;
for (auto& eventGroup : resources.activeEventGroups) {
// Disable the event group first
CUptiResult status = cuptiEventGroupDisable(eventGroup);
if (status != CUPTI_SUCCESS) {
std::cerr << "Warning: Failed to disable event group" << std::endl;
}
// Read any final event values
readFinalEventValues(eventGroup);
// Destroy the event group
CUPTI_CALL(cuptiEventGroupDestroy(eventGroup));
}
resources.activeEventGroups.clear();
std::cout << " Event cleanup complete" << std::endl;
}
void CUPTIFinalizationManager::readFinalEventValues(CUpti_EventGroup eventGroup) {
// Get the number of events in the group
uint32_t numEvents;
size_t size = sizeof(numEvents);
CUPTI_CALL(cuptiEventGroupGetAttribute(eventGroup,
CUPTI_EVENT_GROUP_ATTR_NUM_EVENTS, &size, &numEvents));
if (numEvents > 0) {
// Read final event values before destruction
std::vector<CUpti_EventID> eventIds(numEvents);
std::vector<uint64_t> eventValues(numEvents);
size = numEvents * sizeof(CUpti_EventID);
CUPTI_CALL(cuptiEventGroupGetAttribute(eventGroup,
CUPTI_EVENT_GROUP_ATTR_EVENTS, &size, eventIds.data()));
size = numEvents * sizeof(uint64_t);
CUPTI_CALL(cuptiEventGroupReadAllEvents(eventGroup,
CUPTI_EVENT_READ_FLAG_NONE, &size, eventValues.data(),
&numEvents, eventIds.data()));
// Store final values for reporting
storeFinalEventValues(eventIds, eventValues);
}
}
Callback Deregistration
void CUPTIFinalizationManager::finalizeCallbacks() {
std::cout << "Callback Deregistration:" << std::endl;
std::cout << " Unsubscribing from " << resources.activeCallbacks.size() << " callback domains" << std::endl;
for (auto& subscriber : resources.activeCallbacks) {
// First disable all callbacks for this subscriber
CUPTI_CALL(cuptiEnableCallback(0, subscriber, CUPTI_CB_DOMAIN_RUNTIME_API,
CUPTI_RUNTIME_TRACE_CBID_INVALID));
CUPTI_CALL(cuptiEnableCallback(0, subscriber, CUPTI_CB_DOMAIN_DRIVER_API,
CUPTI_DRIVER_TRACE_CBID_INVALID));
// Wait for any pending callbacks to complete
waitForPendingCallbacks(subscriber);
// Unsubscribe from callbacks
CUPTI_CALL(cuptiUnsubscribe(subscriber));
}
resources.activeCallbacks.clear();
std::cout << " Callback cleanup complete" << std::endl;
}
void CUPTIFinalizationManager::waitForPendingCallbacks(CUpti_SubscriberHandle subscriber) {
// Ensure all pending callbacks have completed
// This is important to avoid race conditions during cleanup
const int maxWaitMs = 1000;
const int pollIntervalMs = 10;
int waitedMs = 0;
while (hasPendingCallbacks(subscriber) && waitedMs < maxWaitMs) {
std::this_thread::sleep_for(std::chrono::milliseconds(pollIntervalMs));
waitedMs += pollIntervalMs;
}
if (waitedMs >= maxWaitMs) {
std::cerr << "Warning: Timeout waiting for callbacks to complete" << std::endl;
}
}
Advanced Finalization Techniques
Graceful Shutdown with Signal Handling
class GracefulShutdownManager {
private:
static std::atomic<bool> shutdownRequested;
static CUPTIFinalizationManager* finalizationManager;
public:
static void signalHandler(int signal) {
std::cout << "Shutdown signal received (" << signal << "), beginning graceful finalization..." << std::endl;
shutdownRequested = true;
if (finalizationManager) {
finalizationManager->beginFinalization();
}
}
static void setupSignalHandlers(CUPTIFinalizationManager* manager) {
finalizationManager = manager;
signal(SIGINT, signalHandler);
signal(SIGTERM, signalHandler);
signal(SIGABRT, signalHandler);
}
static bool isShutdownRequested() {
return shutdownRequested.load();
}
};
// Usage in main application
int main() {
CUPTIFinalizationManager finalizationManager;
GracefulShutdownManager::setupSignalHandlers(&finalizationManager);
// Main application loop
while (!GracefulShutdownManager::isShutdownRequested()) {
// Perform profiling work
doProfilingWork();
// Check for completion
if (isWorkComplete()) {
break;
}
}
// Always finalize, whether through normal completion or signal
finalizationManager.beginFinalization();
return 0;
}
Error-Resilient Finalization
class ResilientFinalizer {
private:
struct FinalizationStep {
std::string name;
std::function<void()> action;
bool isOptional;
bool completed;
};
std::vector<FinalizationStep> finalizationSteps;
public:
void addFinalizationStep(const std::string& name,
std::function<void()> action,
bool optional = false) {
finalizationSteps.push_back({name, action, optional, false});
}
void executeFinalization() {
std::vector<std::string> errors;
for (auto& step : finalizationSteps) {
try {
std::cout << "Executing: " << step.name << std::endl;
step.action();
step.completed = true;
std::cout << " ✓ " << step.name << " completed" << std::endl;
} catch (const std::exception& e) {
std::string error = step.name + ": " + e.what();
errors.push_back(error);
if (step.isOptional) {
std::cout << " ⚠ " << step.name << " failed (optional): " << e.what() << std::endl;
} else {
std::cout << " ✗ " << step.name << " failed (critical): " << e.what() << std::endl;
}
}
}
// Report finalization status
reportFinalizationStatus(errors);
}
private:
void reportFinalizationStatus(const std::vector<std::string>& errors) {
int completed = 0;
int critical_failed = 0;
for (const auto& step : finalizationSteps) {
if (step.completed) {
completed++;
} else if (!step.isOptional) {
critical_failed++;
}
}
std::cout << "Finalization Summary:" << std::endl;
std::cout << " Steps completed: " << completed << "/" << finalizationSteps.size() << std::endl;
std::cout << " Critical failures: " << critical_failed << std::endl;
if (!errors.empty()) {
std::cout << "Errors encountered:" << std::endl;
for (const auto& error : errors) {
std::cout << " - " << error << std::endl;
}
}
}
};
Resource Validation and Leak Detection
class ResourceValidator {
private:
struct ResourceSnapshot {
size_t allocatedMemory;
int activeHandles;
int pendingOperations;
std::chrono::steady_clock::time_point timestamp;
};
ResourceSnapshot initialSnapshot;
public:
void takeInitialSnapshot() {
initialSnapshot = getCurrentResourceSnapshot();
}
bool validateFinalState() {
ResourceSnapshot finalSnapshot = getCurrentResourceSnapshot();
bool isValid = true;
// Check for memory leaks
if (finalSnapshot.allocatedMemory > initialSnapshot.allocatedMemory) {
size_t leakedMemory = finalSnapshot.allocatedMemory - initialSnapshot.allocatedMemory;
std::cout << "Memory leak detected: " << leakedMemory << " bytes" << std::endl;
isValid = false;
}
// Check for unclosed handles
if (finalSnapshot.activeHandles > 0) {
std::cout << "Active handles remaining: " << finalSnapshot.activeHandles << std::endl;
isValid = false;
}
// Check for pending operations
if (finalSnapshot.pendingOperations > 0) {
std::cout << "Pending operations: " << finalSnapshot.pendingOperations << std::endl;
isValid = false;
}
if (isValid) {
std::cout << "Resource validation: PASSED" << std::endl;
} else {
std::cout << "Resource validation: FAILED" << std::endl;
}
return isValid;
}
private:
ResourceSnapshot getCurrentResourceSnapshot() {
ResourceSnapshot snapshot;
snapshot.allocatedMemory = getCurrentMemoryUsage();
snapshot.activeHandles = getActiveHandleCount();
snapshot.pendingOperations = getPendingOperationCount();
snapshot.timestamp = std::chrono::steady_clock::now();
return snapshot;
}
};
Integration with Application Lifecycle
RAII-Based Resource Management
class CUPTISession {
private:
CUPTIFinalizationManager finalizationManager;
ResourceValidator validator;
bool isActive;
public:
CUPTISession() : isActive(false) {
validator.takeInitialSnapshot();
}
~CUPTISession() {
if (isActive) {
finalize();
}
}
void initialize() {
if (isActive) {
throw std::runtime_error("Session already active");
}
// Initialize CUPTI components
initializeActivityTracing();
initializeEventCollection();
initializeCallbacks();
isActive = true;
}
void finalize() {
if (!isActive) {
return;
}
try {
finalizationManager.beginFinalization();
isActive = false;
// Validate cleanup
validator.validateFinalState();
} catch (const std::exception& e) {
std::cerr << "Error during finalization: " << e.what() << std::endl;
}
}
// Move semantics to prevent copying
CUPTISession(CUPTISession&& other) noexcept
: finalizationManager(std::move(other.finalizationManager)),
validator(std::move(other.validator)),
isActive(other.isActive) {
other.isActive = false;
}
CUPTISession& operator=(CUPTISession&& other) noexcept {
if (this != &other) {
finalize(); // Clean up current state
finalizationManager = std::move(other.finalizationManager);
validator = std::move(other.validator);
isActive = other.isActive;
other.isActive = false;
}
return *this;
}
// Disable copy semantics
CUPTISession(const CUPTISession&) = delete;
CUPTISession& operator=(const CUPTISession&) = delete;
};
Finalization in Multi-threaded Applications
class ThreadSafeFinalizationManager {
private:
std::mutex finalizationMutex;
std::atomic<bool> finalizationComplete;
std::vector<std::thread::id> activeThreads;
std::condition_variable finalizationCV;
public:
void registerThread() {
std::lock_guard<std::mutex> lock(finalizationMutex);
activeThreads.push_back(std::this_thread::get_id());
}
void unregisterThread() {
std::lock_guard<std::mutex> lock(finalizationMutex);
auto it = std::find(activeThreads.begin(), activeThreads.end(),
std::this_thread::get_id());
if (it != activeThreads.end()) {
activeThreads.erase(it);
}
// Notify if this was the last thread
if (activeThreads.empty()) {
finalizationCV.notify_all();
}
}
void waitForAllThreads() {
std::unique_lock<std::mutex> lock(finalizationMutex);
finalizationCV.wait(lock, [this] { return activeThreads.empty(); });
}
void finalizeThreadSafe() {
// Signal all threads to stop
signalShutdown();
// Wait for all threads to complete
waitForAllThreads();
// Perform final cleanup
performFinalization();
finalizationComplete = true;
}
};
Best Practices
Finalization Checklist
- Activity Buffers: Flush and process all pending activities
- Event Groups: Disable and destroy all event groups
- Callbacks: Unsubscribe from all callback domains
- Contexts: Clean up all CUPTI contexts
- Memory: Verify no memory leaks
- Handles: Ensure all handles are closed
- Files: Close all output files and streams
Common Pitfalls
- Premature Finalization: Finalizing while operations are still pending
- Resource Leaks: Forgetting to clean up event groups or callbacks
- Race Conditions: Finalizing while callbacks are still executing
- Data Loss: Not flushing activity buffers before shutdown
- Error Handling: Not handling cleanup errors gracefully
Next Steps
- Integrate proper finalization into your CUPTI applications
- Implement graceful shutdown procedures for production systems
- Add resource validation to detect leaks and improper cleanup
- Test finalization under various failure scenarios
- Combine with other CUPTI samples for comprehensive profiling solutions