A month ago, I traveled to San Jose, CA, to visit the GPU Technology Conference, GTC, and learn the latest on all things GPU.

The schedule was super packed and at more than one time, I wasn’t able to see some interesting talk because I was already sitting in one other interesting talk.1

Here’s a list of sessions I found interesting/noteworthy and/or want to (re)visit after the conference, sorted by topics.
Links to recordings and slides are provided. Bold indicates that I have not yet seen the talk and want to do so.

I post this only today since the materials have been private up to now.2

  • Volta, CUDA 9
  • General CUDA, GPU:
  • GPU Data Management
    • S7362: Benchmarking The New Unified Memory Of CUDA 8 (link, recording)
    • S7628: The Future Of GPU Data Management (link, recording, slides)
    • S7285: Unified Memory On The Latest GPU Architectures (Pascal, Volta) (link, recording, slides)
    • S7764: GPUs: Using HMM To Blur The Lines Between CPU And GPU Programming (link, recording, slides)
    • S7128: How To Enable NVIDIA CUDA Stream Synchronous Communications Using Gpudirect (link, recording, slides)
    • S7700: An Introduction To The GPU Memory Model - Presented By Acceleware (session 2 Of 4) (link, recording)
    • S7628: The Future Of GPU Data Management (link, recording, slides)
  • Libraries, Packages, Tools
    • S7150: Accelerating cuBLAS/cuDNN Using Input-aware Auto-tuning: The ISAAC Library (link, recording, slides)
    • S7405: Bifrost: A Python/c++ Framework For Easy High-throughput Computing (link, recording, slides)
    • S7438: Build Systems: Combining CUDA And Modern CMake (link, recording, slides)
  • Multi-GPU, MPI
  • Other Programming Models (OpenACC, OpenMP, OpenCL, Etc.)
    • S7344: Kokkos - The C++ Performance Portability Programming Model (link, recording, slides)
    • S7192: OmpSs+OpenACC: Multi-target Task-based Programming Model Exploiting OpenACC GPU Kernels (link, recording, slides)
    • S7496: OpenCL At NVIDIA: Best Practices, Learnings, And Plans (link, recording, slides)
    • S7626: A Simple Guideline For Code Optimizations On Modern Architectures With OpenACC And CUDA (link, recording, slides)
    • S7636: Cache Directive Optimization In OpenACC Programming Model (link, recording, slides)
    • Use-Cases
      • S7341: Using OpenAC For NGS Techniques To Create A Portable And Easy-to-use Code Base (link, recording, slides)
      • S7640: Porting C++ Applications To GPUs With OpenACC For Lattice Quantum Chromodynamics (link, recording, slides)
      • S7672: OpenACC Best Practices: Accelerating The C++ NUMECA FINE/Open CFD (link, recording, slides)
      • S7635: Comparison Of OpenACC And OpenMP4.5 Offloading: Speeding Up Simulations Of Stellar Explosions (link, recording, slides)
      • S7478: Using OpenACC To Parallelize Irregular Algorithms On GPUs (link, recording, slides)
      • S7193: Achieving Portable Performance For GTC-P With OpenACC On GPU, Multi-core CPU, And Sunway Many-core Processor (link, recording, slides)
      • S7735: GPU Acceleration Of The Higrad Computational Fluid Dynamics Code With Mixed OpenACC And CUDA Fortran (link, recording, slides)
      • S7382: GPUs Unleashed: Analysis Of Petascale Molecular Simulations With VMD (link, recording, slides)
      • S7535: Potential Field Solutions Of The Solar Corona: Converting A PCG Solver From MPI To MPI+OpenACC (link, recording)
  • AI, Machine Learning, Deep Learning, and Siblings
    • S7457: Deep Learning Demystified (link, recording, slides)
    • S7515: Eliminating The Regular Expression With Neural Networks (link, recording, slides)
    • S7800: Leveraging The Power Of Google’s Cloud Machine Learning Service (presented By Google) (link, slides)
    • S7860: Starting A Deep Learning Project (link, recording, slides)
    • S7666: Learning Particle Physics By Example: Using Generative Adversarial Networks To Accelerate Physics (link, recording, slides)
    • S7804: Tensorflow: Open Source Machine Learning (presented By Google) (link, recording)
  • Round Tables, Panels
    • SE7142: CUDA Developer Tools Round Table (nothing on this :()
    • S7564: Accelerator Programming Ecosystems (link, recording, slides)
  • Use-Cases, Applications
    • S7332: Accelerated Astrophysics: Using NVIDIA DGX-1 To Simulate And Understand The Universe (link, recording, slides)
  • Others
    • Python:
    • S7609: Porting After Effects To The GPU (link, recording, slides)
    • S7590: Passengers: Awakening VR, When Film Meet VR (link, nothing on this :()
    • S7296: Cloudlighting: Merging GPU-based Hpc With Cloud Services (link, recording, slides)
    • S7329: Open-source Tools For GPU Programming Assignments In Large Classroom Settings (link, recording, slides)
    • S7482: Advances In Real-time Graphics At Pixar (link, unfortunately nothing else, even though I thought they said so during the session)
    • S7642: Preparing GPU-accelerated Applications For The Summit Supercomputer (link, recording, slides)
  • Keynote (link)
  1. The pinnacle of things was the Wednesday-4pm timeslot, when four this year new-like talks happened at the same time. Talk about parallelism.