Audio sample rates supported by the Dataset Viewer?

cjweaver · April 30, 2026, 3:41pm

Hi There,

I’ve been trying to see if it is possible to get the Dataset Preview player to accept my wav files that have been recorded at a sample rate of 65,536 Hz & 24-bit depth. This is a very non-standard sample rate so it may not be possible but I have seen some dataset have a “dataset-info” field of sample_rate or the like. If it’s not possible perhaps a simple work around is to include a “samples” sub-directory with the audio at 44.1/16bit?

Here’s my dataset: https://huggingface.co/datasets/cjweaver/ARU_speech_corpus

John6666 · May 1, 2026, 12:30am

Judging by the GitHub code of DatasetViewer, it doesn’t appear to actively convert audio for preview purposes, so adding a lightweight version of the audio data for previewing would be a good idea.

For example, with image datasets, you often see a setup where high-quality main data is accompanied by a JPEG for previewing. I think the same approach could work for audio datasets as well…

just in case, pinging, @lhoestq

Audio sample rates supported by the Hugging Face Dataset Viewer?

Short answer

I do not see a documented Hugging Face Dataset Viewer whitelist that says something like:

“The Dataset Preview audio player supports only 8 kHz, 16 kHz, 44.1 kHz, 48 kHz, etc.”

The public Hugging Face docs say that audio datasets can get a Dataset Viewer when they use a supported repository structure and supported file formats. The Hugging Face Hub audio dataset guide explicitly lists AIFF, FLAC, MP3, OGG, and WAV as supported audio formats.

However, that is format-level support, not a guarantee that every possible WAV profile will play in the browser preview player.

For my dataset, the safest interpretation is:

65,536 Hz / 24-bit WAV is valid research audio, but it is not a safe browser-preview target. The sample_rate / sampling_rate metadata is useful for programmatic loading and resampling, but it should not be treated as a setting that forces the Hub Dataset Preview player to transcode or accept the original uploaded WAV files.

So, yes: a separate browser-preview copy at 44.1 kHz / 16-bit or 48 kHz / 16-bit is the practical workaround.

I would keep the original 65,536 Hz / 24-bit WAV files as the canonical corpus, and add a clearly labeled preview config or preview split such as browser_preview_48k16.

Why this is not just “WAV supported or unsupported”

There are several layers involved:

Layer	What it answers	Relevance here
Hub repository storage	Can Hugging Face host the files?	Yes. The files can be stored on the Hub.
Dataset structure recognition	Can Hugging Face infer splits, metadata, and audio columns?	Mostly yes, if the repository follows the expected audio dataset layout.
Dataset Viewer backend	Can the Viewer generate precomputed rows, Parquet exports, and preview assets?	This can fail independently of ordinary file hosting.
Browser audio playback	Can the user’s browser play the served audio file?	This is the risky part for 65,536 Hz / 24-bit WAV.
Python `datasets` loading	Can users load and resample the audio in code?	Usually a separate and more controllable path.

The Dataset Viewer backend docs describe the Viewer as a backend/API layer that precomputes data and auto-converts Hub datasets to Parquet. The Dataset Viewer Quickstart also exposes separate endpoints for checking validity, splits, first rows, row slices, Parquet files, size, and statistics.

That matters because a dataset can be valid and loadable while the preview UI still fails.

In other words:

A working Hugging Face dataset does not necessarily imply a working browser audio player for every source file.

What is special about this dataset?

The dataset page for cjweaver/ARU_speech_corpus is already recognized by Hugging Face as an audio/text dataset in soundfolder format. It shows a default subset with 8.64k rows, train/validation/test splits, and auto-converted Parquet. It also currently shows the Dataset Viewer failing for the selected split with UnexpectedApiError.

That suggests the issue is probably not basic dataset discovery. Hugging Face is not simply ignoring the repository. It sees it as an audio dataset.

The dataset card says the corpus contains:

8,640 utterances
12 native British English speakers
720 IEEE sentences per speaker
single-channel recordings
controlled anechoic recording conditions
65,536 Hz sampling rate
24-bit depth

The card also notes that the high sample rate makes the corpus useful for wideband and super-wideband speech processing research, while also recommending downsampling for many applications that do not need that bandwidth.

So I would not describe the source audio as “wrong.” It is a legitimate archival/research format. It is just not a conservative browser-preview format.

What the Dataset Viewer backend code suggests

The public huggingface/dataset-viewer repository clarifies the backend path, but with one important limitation: the repository README says it is the backend that provides the Dataset Viewer with precomputed data through an API, and that the frontend viewer component is not part of that repository.

So the repo can clarify things like:

audio row post-processing;
audio asset generation;
supported backend audio extensions;
whether there is an obvious sample-rate whitelist;
whether .wav is treated specially.

It cannot fully reveal:

the closed Hub frontend audio component;
browser-specific decoding behavior;
how every browser handles an unusual WAV profile.

Still, the backend code is very useful.

1. Backend support appears extension/MIME-based, not sample-rate-whitelist-based

In asset.py, the backend defines supported audio extensions and MIME types like this:

SUPPORTED_AUDIO_EXTENSION_TO_MEDIA_TYPE = {
    ".wav": "audio/wav",
    ".mp3": "audio/mpeg",
    ".opus": "audio/ogg",
    ".flac": "audio/x-flac",
}

I do not see a visible backend-side list like:

SUPPORTED_SAMPLE_RATES = [8000, 16000, 44100, 48000]

or:

MAX_AUDIO_SAMPLE_RATE = 48000

The visible backend check is about extension and media type, not “this WAV must be 16 kHz / 44.1 kHz / 48 kHz.”

That supports this interpretation:

Hugging Face’s backend likely recognizes .wav as a supported audio extension, but that does not mean every WAV profile is browser-playable.

2. Existing `.wav` files may be passed through unchanged

In features.py, the audio handling code checks whether the audio value has a path, whether there are no embedded bytes, whether the file extension is supported, and whether the path starts with hf://datasets/....

If so, it returns an AudioSource using the resolved Hub URL and the MIME type for that extension.

Conceptually, for this dataset, that can look like:

65,536 Hz / 24-bit .wav in dataset repo
        ↓
Dataset Viewer backend sees ".wav"
        ↓
".wav" is a supported audio extension
        ↓
backend returns URL + type "audio/wav"
        ↓
Hub frontend / browser tries to play the original WAV

That is the key point.

If the backend is returning a URL to the original .wav, then adding a sample_rate metadata field will not necessarily change what the browser receives.

3. Conversion exists, but likely not for already-supported `.wav` files

The backend has a create_audio_file path in asset.py.

The relevant behavior is roughly:

if the source extension matches the target extension, it writes the bytes directly;
if the source extension differs, it converts using pydub.AudioSegment.from_file(...) and exports to the target format;
the code comment notes that this conversion may spawn FFmpeg.

That means conversion exists, but the subtle issue is:

If the source file is already .wav, the backend may treat it as supported and may not normalize it to 44.1/16 or 48/16.

So the fact that the files are WAV may cause the backend to pass them through, rather than convert them to a browser-safe derivative.

4. `sampling_rate` exists, but not as a preview-player target setting

The backend also has an AudioDecoder path in features.py. In that path, it extracts an audio array and _sampling_rate, then writes a WAV using soundfile.write(..., _sampling_rate, format="wav").

That is not the same thing as:

“Please make all preview audio 44.1 kHz or 48 kHz.”

It simply means that when the backend has decoded audio-array data, it can write a WAV using the sampling rate associated with that decoded object.

This matches the Hugging Face datasets audio loading docs, which describe audio decoding and access through the Audio feature.

What `sample_rate` / `sampling_rate` metadata actually does

This is the point most likely to cause confusion.

The datasets audio processing docs show that an audio column can be resampled programmatically with:

from datasets import load_dataset, Audio

dataset = load_dataset("PolyAI/minds14", "en-US", split="train")
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))

The docs say audio files are decoded and resampled on the fly the next time an example is accessed.

The Hugging Face Audio Course preprocessing page makes the same point: cast_column(..., Audio(sampling_rate=...)) does not change the audio in place; it tells datasets to resample examples on the fly when they are loaded.

So:

Mechanism	What it does
`Audio(sampling_rate=16000)`	Resamples decoded examples when accessed through `datasets`
`dataset_info` / feature metadata	Describes the dataset schema/features
Uploaded `.wav` file	Remains the actual stored file
Dataset Preview player	Receives an audio URL/source and depends on backend/frontend/browser handling

In plain language:

sampling_rate is useful for Python users. It is not a guaranteed setting for forcing the Hub web player to transcode the uploaded WAV before playback.

This is valid for Python users:

from datasets import load_dataset, Audio

ds = load_dataset("cjweaver/ARU_speech_corpus", split="train")
ds_16k = ds.cast_column("audio", Audio(sampling_rate=16_000))

example = ds_16k[0]["audio"]

But I would not assume that this happens automatically in the Hub UI:

uploaded 65,536 Hz / 24-bit WAV
        ↓
Hub silently creates browser-safe 44.1 kHz / 16-bit WAV
        ↓
Dataset Preview player uses the converted file

That would be a preview transcoding feature. I do not see documentation that says the Dataset Preview player does that for this kind of WAV edge case.

Browser playback is its own compatibility layer

The final playback layer is the browser.

MDN’s <audio> element documentation describes browser audio playback as source-based: the browser receives one or more sources and attempts to play a suitable one.

MDN’s audio codec guide frames web audio as a codec/container compatibility problem, not merely “the file has an audio extension.”

That matters because this file is not just “a WAV.” It is:

WAV container
PCM audio
65,536 Hz sample rate
24-bit depth
single channel

A browser may handle conventional 44.1 kHz or 48 kHz PCM WAV more reliably than an unusual 65,536 Hz / 24-bit WAV. The Dataset Viewer backend may consider .wav supported, but the browser still has to decode the actual stream.

My diagnosis for this exact case

I would rank the likely explanations like this.

Most likely

The Dataset Viewer backend recognizes the dataset and treats .wav as supported, but the original 65,536 Hz / 24-bit WAV reaches the preview/player path unchanged. The frontend/browser or the viewer asset path then fails on this unusual WAV profile.

This explanation fits:

the dataset being recognized as soundfolder;
the presence of Parquet conversion;
the backend code supporting .wav by extension/MIME type;
the absence of an obvious backend sample-rate whitelist;
the unusual nature of 65,536 Hz / 24-bit audio for browser playback.

Also possible

One or more individual WAV files may be malformed, inconsistent, or difficult to decode. A single problematic file can sometimes break first-row generation or row post-processing.

Also possible

There may be a metadata/path/layout issue, especially if metadata.csv paths do not exactly match what AudioFolder expects.

The Hugging Face audio dataset docs say metadata.csv must contain a file_name column that links audio files to metadata, and relative paths must be full relative paths when files are not next to the metadata file.

Less likely

A hidden documented rule such as:

Dataset Viewer WAV sample_rate must be <= 48,000 Hz

I have not found such a public rule in the docs or in the visible backend code.

Should the original files be replaced?

No.

I would not replace the canonical corpus with downsampled 44.1/16 or 48/16 files.

The original format is part of the dataset’s value. The dataset card describes anechoic recordings, professional measurement equipment, 65,536 Hz sampling, 24-bit depth, careful filtering, active-speech-level normalization, and use cases such as speech intelligibility, signal processing, ASR, speech quality assessment, and acoustic research.

That original format is not a mistake.

Instead, I would treat this as two parallel purposes:

Purpose	Best format
Canonical research corpus	65,536 Hz / 24-bit WAV
Programmatic ASR/model use	Load original and resample with `Audio(sampling_rate=...)`
Hub browser preview	44.1 kHz or 48 kHz / 16-bit PCM WAV
Human quick inspection	Browser-preview config

This gives different users the right version without weakening the original dataset.

Should the workaround be a `samples` subdirectory?

The idea is right, but I would avoid the name samples.

A folder named samples is ambiguous. It can mean:

example files;
a reduced training subset;
a split;
a second config;
a browser preview;
or a separate derivative dataset.

Use a name that encodes the purpose and format:

browser_preview_48k16/

or:

browser_preview_44k16/

I would use:

browser_preview_48k16/

because 48 kHz / 16-bit PCM WAV is a conservative media/web-style preview target. The proposed 44.1 kHz / 16-bit version is also reasonable.

44.1 kHz / 16-bit vs 48 kHz / 16-bit

Both are reasonable.

Preview format	My view
48 kHz / 16-bit PCM WAV	My preferred browser-preview choice. Common in media/video/web workflows; comfortably preserves speech-band content.
44.1 kHz / 16-bit PCM WAV	Also good. Familiar “CD-quality” convention; likely much safer than 65,536/24.
16 kHz / 16-bit WAV	Good for many ASR workflows, but too model-specific as the only general preview.
MP3	Very browser-friendly and small, but lossy and less appropriate as a research-facing preview.
FLAC	Lossless and compact, but I would test browser/viewer behavior before relying on it.

Because the source card says the signal was low-pass filtered above 9 kHz, both 44.1 kHz and 48 kHz are more than enough for listening preview. The main goal is not preserving the original acquisition rate in the preview; the main goal is providing a stable browser-playable representation while leaving the original intact.

Recommended repository structure

A clear structure would be:

README.md

train/
  metadata.csv
  *.wav                 # canonical 65,536 Hz / 24-bit

validation/
  metadata.csv
  *.wav                 # canonical 65,536 Hz / 24-bit

test/
  metadata.csv
  *.wav                 # canonical 65,536 Hz / 24-bit

browser_preview_48k16/
  metadata.csv
  *.wav                 # selected 48 kHz / 16-bit browser-preview files

Then document the preview directory clearly.

A conceptual YAML configuration could look like:

---
configs:
  - config_name: default
    data_files:
      - split: train
        path: train/**
      - split: validation
        path: validation/**
      - split: test
        path: test/**
  - config_name: browser_preview_48k16
    data_files:
      - split: preview
        path: browser_preview_48k16/**
---

I would test this on a small branch or a tiny duplicate dataset first. The exact YAML may need adjustment depending on how AudioFolder infers the layout, but the design principle is strong:

Separate canonical source audio from browser-preview derivative audio.

Recommended preview metadata

For preview files, I would include metadata that links each preview file back to the canonical source file.

Example:

file_name,source_file_name,speaker_id,sex,age,accent,list_number,sentence_number,text,preview_sample_rate,preview_bit_depth,preview_purpose
ID01_list01_sent01_preview_48k16.wav,train/ID01_ARU_Fs=65536Hz_Standard speech - List 1 - Sentence 1 - Version 1_0.wav,01,M,47,Avon,1,1,"<transcription>",48000,16,browser preview only

The important required-style column is file_name, because the Hugging Face audio dataset docs use that column to link metadata rows to audio files.

The source_file_name and preview-related columns are for provenance and clarity.

How many preview files?

I would start with a representative subset, not a full duplicate of all 8,640 files.

A good first pass:

12 speakers × 10 utterances = 120 preview files

Make sure the preview subset covers:

all 12 speakers;
both sexes;
varied ages and accents;
examples from train, validation, and test;
varied sentence/list numbers;
transcripts;
source-file provenance.

That is enough for a useful Hub preview while keeping the derivative set clearly secondary.

If it works and row-by-row preview for every example becomes important, the preview config can be expanded later.

Conversion commands

For 48 kHz / 16-bit mono WAV:

mkdir -p browser_preview_48k16

ffmpeg -y -i "input.wav" \
  -ac 1 \
  -ar 48000 \
  -sample_fmt s16 \
  "browser_preview_48k16/output_preview_48k16.wav"

For 44.1 kHz / 16-bit mono WAV:

mkdir -p browser_preview_44k16

ffmpeg -y -i "input.wav" \
  -ac 1 \
  -ar 44100 \
  -sample_fmt s16 \
  "browser_preview_44k16/output_preview_44k16.wav"

Batch example:

mkdir -p browser_preview_48k16

find train validation test -name "*.wav" | head -n 120 | while read -r f; do
  base="$(basename "${f%.wav}")"
  ffmpeg -y -i "$f" \
    -ac 1 \
    -ar 48000 \
    -sample_fmt s16 \
    "browser_preview_48k16/${base}_preview_48k16.wav"
done

Because the original source is 24-bit and high-rate, I would use FFmpeg or another well-tested resampler rather than a quick-and-dirty script that just drops samples. The Hugging Face Audio Course preprocessing page notes that resampling is not an in-place rewrite in datasets; for actual derivative files, do the conversion explicitly and document it.

Diagnostic checks I would run

1. Dataset Viewer API checks

Use the Dataset Viewer API to locate the failure layer.

The Dataset Viewer API docs and /rows docs are useful here. The /rows docs say image and audio samples are represented by URLs, and those assets are cached temporarily.

Run:

curl "/static-proxy?url=https%3A%2F%2Fdatasets-server.huggingface.co%2Fsplits%3Fdataset%3Dcjweaver%2FARU_speech_corpus"

curl "/static-proxy?url=https%3A%2F%2Fdatasets-server.huggingface.co%2Fparquet%3Fdataset%3Dcjweaver%2FARU_speech_corpus"

curl "/static-proxy?url=https%3A%2F%2Fdatasets-server.huggingface.co%2Ffirst-rows%3Fdataset%3Dcjweaver%2FARU_speech_corpus%26amp%3Bconfig%3Ddefault%26amp%3Bsplit%3Dtrain"

curl "/static-proxy?url=https%3A%2F%2Fdatasets-server.huggingface.co%2Frows%3Fdataset%3Dcjweaver%2FARU_speech_corpus%26amp%3Bconfig%3Ddefault%26amp%3Bsplit%3Dtrain%26amp%3Boffset%3D0%26amp%3Blength%3D10"

Try multiple offsets:

curl "/static-proxy?url=https%3A%2F%2Fdatasets-server.huggingface.co%2Frows%3Fdataset%3Dcjweaver%2FARU_speech_corpus%26amp%3Bconfig%3Ddefault%26amp%3Bsplit%3Dtrain%26amp%3Boffset%3D100%26amp%3Blength%3D10"

curl "/static-proxy?url=https%3A%2F%2Fdatasets-server.huggingface.co%2Frows%3Fdataset%3Dcjweaver%2FARU_speech_corpus%26amp%3Bconfig%3Ddefault%26amp%3Bsplit%3Dtrain%26amp%3Boffset%3D1000%26amp%3Blength%3D10"

Interpretation:

API result	Meaning
`/splits` works	Dataset config/split structure is recognized.
`/parquet` works	Auto-converted Parquet exists.
`/first-rows` fails	First-row post-processing or asset generation may be failing.
`/rows` fails at specific offsets	Specific files may be problematic.
`/rows` returns audio URLs, but browser does not play them	Browser/player compatibility is likely.
48/16 preview config works	Source WAV profile was likely the practical trigger.

2. Local WAV validation

Run:

ffprobe -hide_banner -show_format -show_streams "example.wav"

Look for:

codec_name=pcm_s24le
sample_rate=65536
bits_per_sample=24
channels=1

Batch check:

find train validation test -name "*.wav" -print0 |
while IFS= read -r -d '' f; do
  ffprobe -v error -select_streams a:0 \
    -show_entries stream=codec_name,sample_rate,bits_per_sample,channels,duration \
    -of default=nw=1:nk=0 "$f" >/dev/null || echo "BAD: $f"
done

This does not prove browser compatibility, but it helps rule out corrupt files, inconsistent headers, unexpected channel counts, or nonstandard PCM variants.

3. A/B preview test

Create two tiny configs or a test dataset:

test_original/
  metadata.csv
  *.wav   # 65,536 Hz / 24-bit

test_preview_48k16/
  metadata.csv
  *.wav   # 48,000 Hz / 16-bit

Then compare.

Result	Interpretation
Original fails, 48/16 works	Strong evidence that the original WAV profile is the issue.
Both fail	More likely metadata/layout/viewer issue.
Both work in API but original fails in browser	Browser playback issue.
Preview works everywhere	Add the preview config to the real dataset.

Dataset-card wording I would use

## Audio format and browser preview

The canonical ARU Speech Corpus audio is stored as mono 65,536 Hz / 24-bit PCM WAV, matching the original measurement-oriented acquisition format.

This sample rate and bit depth are useful for archival and acoustic research, but may not be reliably playable in browser-based dataset preview players.

For convenience, this repository includes a separate `browser_preview_48k16` configuration containing selected utterances converted to 48 kHz / 16-bit PCM WAV. These files are intended only for quick listening and inspection in the Hub UI.

For training, evaluation, or acoustic analysis, use the canonical `default` configuration and resample explicitly for your target workflow.

Programmatic loading example:

from datasets import load_dataset, Audio

# Canonical source audio
ds = load_dataset("cjweaver/ARU_speech_corpus", split="train")

# Example: resample on access for a 16 kHz ASR model
ds_16k = ds.cast_column("audio", Audio(sampling_rate=16_000))

This mirrors the official datasets path for audio resampling rather than implying that the Hub web player will perform the conversion.

What I would ask Hugging Face

A precise support/discussion question would be:

The dataset uses canonical 65,536 Hz / 24-bit PCM WAV files. The public Dataset Viewer backend appears to treat .wav as a supported extension and may return an AudioSource pointing at the original dataset file with MIME type audio/wav, rather than resampling/transcoding it. Is the Dataset Preview player expected to support 65,536 Hz / 24-bit WAV directly, or should datasets with nonstandard WAV profiles provide a separate browser-preview config at 44.1/16 or 48/16?

That question is better than only asking “what sample rates are supported?” because it reflects the actual backend behavior visible in the code.

Final recommendation

For this dataset, I would do this:

Keep the original 65,536 Hz / 24-bit WAV files as the canonical default dataset.
Do not rely on dataset_info.sample_rate or Audio(sampling_rate=...) to fix the Hub player.
Add a separate preview config named browser_preview_48k16 or browser_preview_44k16.
Use 48 kHz / 16-bit PCM WAV for the preview files if choosing one default.
Start with a representative subset, around 120 files, covering all speakers and splits.
Include metadata linking every preview file back to the canonical source file.
Use the Dataset Viewer API to confirm whether the current error is row generation, asset generation, or browser playback.
If 48/16 preview files still fail, investigate metadata/config/layout rather than sample rate.

Short summary

Hugging Face documents audio dataset support for WAV, but not a precise Dataset Viewer sample-rate whitelist.
The public Dataset Viewer backend code appears to support audio by extension/MIME type, not by a visible sample-rate list.
Existing .wav files may be passed through to the frontend/browser unchanged.
sampling_rate metadata helps Python loading/resampling; it should not be assumed to transcode Hub preview audio.
The original 65,536 Hz / 24-bit WAV files are valuable as canonical research audio.
A separate browser_preview_48k16 or browser_preview_44k16 config is the clean, low-risk workaround.

cjweaver · May 1, 2026, 8:37am

@John6666 Thank you so much for such a great answer!

Yes, this is a very strange sampling rate (in all my years working with audio I have never seen it and in my model training I just down sample to 48 kHz) and I’ll simply include a “browser_preview” directory as your answer suggested.

Thanks again.

Topic		Replies	Views
Problem with Dataset Preview with audio files 🤗Datasets	7	1393	April 17, 2025
Audio files view error 🤗Datasets	7	998	March 27, 2023
Convert from HF audio dataset to raw audio file 🤗Datasets	1	958	November 22, 2023
Datasets Viewer RowPostProcessingError Beginners	3	82	October 9, 2025
Finetuning Wav2Vec2 for ASR notebook doesn't work 🤗Datasets	0	303	September 9, 2022

Audio sample rates supported by the Dataset Viewer?

Audio sample rates supported by the Hugging Face Dataset Viewer?

Short answer

Why this is not just “WAV supported or unsupported”

What is special about this dataset?

What the Dataset Viewer backend code suggests

1. Backend support appears extension/MIME-based, not sample-rate-whitelist-based

2. Existing .wav files may be passed through unchanged

3. Conversion exists, but likely not for already-supported .wav files

4. sampling_rate exists, but not as a preview-player target setting

What sample_rate / sampling_rate metadata actually does

Browser playback is its own compatibility layer

My diagnosis for this exact case

Most likely

Also possible

Also possible

Less likely

Should the original files be replaced?

Should the workaround be a samples subdirectory?

44.1 kHz / 16-bit vs 48 kHz / 16-bit

Recommended repository structure

Recommended preview metadata

How many preview files?

Conversion commands

Diagnostic checks I would run

1. Dataset Viewer API checks

2. Local WAV validation

3. A/B preview test

Dataset-card wording I would use

What I would ask Hugging Face

Final recommendation

Short summary

Related topics

2. Existing `.wav` files may be passed through unchanged

3. Conversion exists, but likely not for already-supported `.wav` files

4. `sampling_rate` exists, but not as a preview-player target setting

What `sample_rate` / `sampling_rate` metadata actually does

Should the workaround be a `samples` subdirectory?