[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float' #33552

felipehertzer · 2024-09-18T04:49:37Z

System Info

transformers version: 4.44.2
Platform: macOS-15.0-arm64-arm-64bit
Python version: 3.12.6
Huggingface_hub version: 0.24.7
Safetensors version: 0.4.5
Accelerate version: 0.34.2
Accelerate config: not found
PyTorch version (GPU?): 2.6.0.dev20240916 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No

Who can help?

@kamilakesbi @ArthurZucker @itazap

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Hi, I am attempting to transcribe several audio files; however, the process intermittently encounters an exception with some of the files. The transcription works successfully in approximately 90% of the cases, but certain files trigger this exception unexpectedly. I am attaching one of the audio files that generates this exception for your review. Thank you.

I was able replicate it on a MacOS on CPU and Linux on CUDA.

1 Install Stable TS
pip install stable-ts

2 Run the code:

import stable_whisper

model = stable_whisper.load_hf_whisper('medium')
result = model.transcribe(
    audio = 'radio_18596_1726554951_1726554981.mp3',
)
print(result.text)

Audio sample: https://filebin.net/hivqswoer298m65m

Than I receive the follow exception:

Traceback (most recent call last):
  File "/tests/test.py", line 4, in <module>
    result = model.transcribe(
             ^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/whisper_word_level/hf_whisper.py", line 236, in transcribe
    return transcribe_any(
           ^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/non_whisper.py", line 342, in transcribe_any
    result = inference_func(**inference_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/whisper_word_level/hf_whisper.py", line 116, in _inner_transcribe
    output = self._pipe(audio, **pipe_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 284, in __call__
    return super().__call__(inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1255, in __call__
    return next(
           ^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 587, in postprocess
    text, optional = self.tokenizer._decode_asr(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 835, in _decode_asr
    return _decode_asr(
           ^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1086, in _decode_asr
    resolved_tokens, resolved_token_timestamps = _find_longest_common_sequence(
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1193, in _find_longest_common_sequence
    matches = sum(
              ^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1198, in <genexpr>
    and left_token_timestamp_sequence[left_start + idx]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<=' not supported between instances of 'NoneType' and 'float'

Expected behavior

To be able to transcibe the audio files without this exception.

The text was updated successfully, but these errors were encountered:

itazap · 2024-09-18T12:48:16Z

Thanks for raising, looks like the below indeed does happen:

transformers/src/transformers/models/whisper/tokenization_whisper.py

Lines 1058 to 1061 in db72894

    
           if i + 1 < len(token_timestamps): 
        
               end_time = round(token_timestamps[i + 1] + time_offset, 2) 
        
           else: 
        
               end_time = None  # should never happen

since this loops over tokens and the last index + 1 will be out of range:

transformers/src/transformers/models/whisper/tokenization_whisper.py

Line 971 in db72894

for i, token in enumerate(token_ids):

cc @eustlb @ylacombe wdyt about how the last timestamp should be handled ?

felipehertzer added the bug label Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float' #33552

[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float' #33552

felipehertzer commented Sep 18, 2024

itazap commented Sep 18, 2024 •

edited

Loading

[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float' #33552

[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float' #33552

Comments

felipehertzer commented Sep 18, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

itazap commented Sep 18, 2024 • edited Loading

itazap commented Sep 18, 2024 •

edited

Loading