Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float' #33552

Open
4 tasks
felipehertzer opened this issue Sep 18, 2024 · 1 comment
Labels

Comments

@felipehertzer
Copy link

System Info

  • transformers version: 4.44.2
  • Platform: macOS-15.0-arm64-arm-64bit
  • Python version: 3.12.6
  • Huggingface_hub version: 0.24.7
  • Safetensors version: 0.4.5
  • Accelerate version: 0.34.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.6.0.dev20240916 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No

Who can help?

@kamilakesbi @ArthurZucker @itazap

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Hi, I am attempting to transcribe several audio files; however, the process intermittently encounters an exception with some of the files. The transcription works successfully in approximately 90% of the cases, but certain files trigger this exception unexpectedly. I am attaching one of the audio files that generates this exception for your review. Thank you.

  • I was able replicate it on a MacOS on CPU and Linux on CUDA.

1 Install Stable TS
pip install stable-ts

2 Run the code:

import stable_whisper

model = stable_whisper.load_hf_whisper('medium')
result = model.transcribe(
    audio = 'radio_18596_1726554951_1726554981.mp3',
)
print(result.text)

Audio sample: https://filebin.net/hivqswoer298m65m

Than I receive the follow exception:

Traceback (most recent call last):
  File "/tests/test.py", line 4, in <module>
    result = model.transcribe(
             ^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/whisper_word_level/hf_whisper.py", line 236, in transcribe
    return transcribe_any(
           ^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/non_whisper.py", line 342, in transcribe_any
    result = inference_func(**inference_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/whisper_word_level/hf_whisper.py", line 116, in _inner_transcribe
    output = self._pipe(audio, **pipe_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 284, in __call__
    return super().__call__(inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1255, in __call__
    return next(
           ^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 587, in postprocess
    text, optional = self.tokenizer._decode_asr(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 835, in _decode_asr
    return _decode_asr(
           ^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1086, in _decode_asr
    resolved_tokens, resolved_token_timestamps = _find_longest_common_sequence(
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1193, in _find_longest_common_sequence
    matches = sum(
              ^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1198, in <genexpr>
    and left_token_timestamp_sequence[left_start + idx]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<=' not supported between instances of 'NoneType' and 'float'

Expected behavior

To be able to transcibe the audio files without this exception.

@itazap
Copy link
Contributor

itazap commented Sep 18, 2024

Thanks for raising, looks like the below indeed does happen:

if i + 1 < len(token_timestamps):
end_time = round(token_timestamps[i + 1] + time_offset, 2)
else:
end_time = None # should never happen

since this loops over tokens and the last index + 1 will be out of range:

for i, token in enumerate(token_ids):

cc @eustlb @ylacombe wdyt about how the last timestamp should be handled ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants