Blog | 13 of 24

August 29, 2022

Fast Beam Search Decoding in PyTorch with TorchAudio and Flashlight Text

Beam search decoding with industry-leading speed from Flashlight Text (part of the Flashlight ML framework) is now available with official support in TorchAudio, bringing high-performance beam search and text utilities for speech and text applications built on top of PyTorch. The current integration supports CTC-style decoding, but it can be used for any modeling setting that outputs token-level probability distributions over time steps.

August 26, 2022

Introducing nvFuser, a deep learning compiler for PyTorch

nvFuser is a Deep Learning Compiler for NVIDIA GPUs that automatically just-in-time compiles fast and flexible kernels to reliably accelerate users’ networks. It provides significant speedups for deep learning networks running on Volta and later CUDA accelerators by generating fast custom “fusion” kernels at runtime. nvFuser is specifically designed to meet the unique requirements of the PyTorch community, and it supports diverse network architectures and programs with dynamic inputs of varyi...

August 24, 2022

Accelerating PyTorch Vision Models with Channels Last on CPU

Overview

August 18, 2022

Easily list and initialize models with new APIs in TorchVision

TorchVision now supports listing and initializing all available built-in models and weights by name. This new API builds upon the recently introduced Multi-weight support API, is currently in Beta, and it addresses a long-standing request from the community.

August 16, 2022

Empowering PyTorch on Intel® Xeon® Scalable processors with Bfloat16

Overview

July 22, 2022

Introducing the PlayTorch app: Rapidly Create Mobile AI Experiences

July 19, 2022

What Every User Should Know About Mixed Precision Training in PyTorch

Efficient training of modern neural networks often relies on using lower precision data types. Peak float16 matrix multiplication and convolution performance is 16x faster than peak float32 performance on A100 GPUs. And since the float16 and bfloat16 data types are only half the size of float32 they can double the performance of bandwidth-bound kernels and reduce the memory required to train a network, allowing for larger models, larger batches, or larger inputs. Using a module like torch.amp...

PyTorch strengthens its governance by joining the Linux Foundation

Fast Beam Search Decoding in PyTorch with TorchAudio and Flashlight Text

Introducing nvFuser, a deep learning compiler for PyTorch

Accelerating PyTorch Vision Models with Channels Last on CPU

Easily list and initialize models with new APIs in TorchVision

Empowering PyTorch on Intel® Xeon® Scalable processors with Bfloat16

Introducing the PlayTorch app: Rapidly Create Mobile AI Experiences

What Every User Should Know About Mixed Precision Training in PyTorch

Install PyTorch

Quick Start With
Cloud Partners

Docs

Tutorials

Resources

Install PyTorch

Quick Start WithCloud Partners

Docs

Tutorials

Resources

Quick Start With
Cloud Partners