Suspension and size support

by NiceLook - opened 10 days ago

10 days ago

Hi ,
LLM begginer here
I used the model with ~ 500 words, there was suspension dots halfway through ("...") and the output stopped at this very moment , half of the text is missing
Also suspension dots do not have an effect on a smaller corpus (> 20 tokens )

Otherwise It's great to have this tool but I would like to use it with ~1500 words

nassimaODL

Hi! PARIS org 10 days ago

Hi,

Thanks a lot for trying the model and for taking the time to open this issue!

A few clarifications that may explain what you are observing:

1.	Input length / truncation

The current checkpoint was not designed for very long inputs (hundreds of tokens is usually fine, but full documents with ~1500 words are out of scope for now).
On the hosted inference widget, the text is tokenized internally, and long inputs can be truncated or cause the generation to stop earlier than expected. This is probably why your 500-word example gets cut roughly in the middle, around the ellipsis.

For longer texts, I recommend:
• splitting the text into smaller chunks (for example by sentences or short paragraphs),
• running the model separately on each chunk,
• then concatenating the SSML outputs.

2.	Suspension dots (...) and prosodic breaks

This model is trained to convert explicit break symbols (for example custom tokens such as #250, #500, etc.) into SSML tags like .
Plain punctuation such as ... is not explicitly modeled as a prosodic break in this checkpoint, so it is expected that suspension dots alone do not reliably change the output on smaller examples.

If you want ellipsis to correspond to a pause, you can:
• preprocess your text by replacing ... with an explicit break symbol (e.g. a special token that you then map to ),
• or directly insert the desired tag in your SSML after the model’s prediction.

⸻

In the future I plan to explore:
• support for longer contexts, and
• a more direct handling of punctuation such as ellipsis.

Thanks again for your feedback — it is very helpful to improve the next versions of the model!

NiceLook

10 days ago

Many thanks for your insightful clarification , I have read the paper published by your team , great work

Input length

One thing I didnt mention is that I'm running the inference on a cycle restricted cpu since i couldn't get ROCM running on my old gpu , that may also explain that truncation problem

Ellipsis and prosodic breaks

I would like to preprocess the text as little as possible as possible, my goal is to be able to parse the ssml output with javascript to create "prosodic web animations" so the less delta between original and input text , the better

Many thanks for your work , I look forward for updates :)

nassimaODL

Hi! PARIS org 10 days ago

Hi,

Thank you very much for your message and for reading our paper!

You’re right: the truncation is most likely due to sequence length limits in the current pipeline rather than to your CPU. For now the safest workaround is still to split long texts into smaller chunks.

Your idea of using the SSML for “prosodic web animations” is really cool. To keep the delta with the original text small, you could use a reversible trick such as internally replacing ... with a special token before inference, then mapping it back (and/or to a <break> tag) afterwards in JavaScript.

Thanks again for the feedback and for your interest in the project!

Best,
Nassima

nassimaODL changed discussion status to closed 3 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment