Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solve zipformer streaming gpu inference #961

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

whaozl
Copy link

@whaozl whaozl commented Mar 23, 2023

No description provided.

@yaozengwei
Copy link
Collaborator

The script jit_trace_export.py exports model with torch.jit.trace. Why replace it with torch.jit.script?

Copy link
Collaborator

@yfyeung yfyeung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain the reason for this modification.

yfyeung

This comment was marked as duplicate.

@whaozl
Copy link
Author

whaozl commented Mar 23, 2023

@yaozengwei because it has a error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! when using the torch.jit.trace on encode model.

see k2-fsa/sherpa#346

so, only need encode to torch.jit.script.
the decode and joiner keep the same as torch.jit.trace.

@yaozengwei
Copy link
Collaborator

Could you try to convert the model to cuda device instead of cpu when doing the jit.trace exporting (See

)?
We also need to create the inputs on cuda device in this case. (See
x = torch.zeros(1, T, 80, dtype=torch.float32)
)
I wonder if we need to export the model on cuda device when we want to run the model on cuda device.

See https://pytorch.org/docs/stable/jit.html#frequently-asked-questions
Screenshot 2023-03-23 at 17 14 28

@whaozl
Copy link
Author

whaozl commented Mar 23, 2023

@yaozengwei

there are two scenes:
1、convert the model to cuda device instead of cpu when doing the jit.trace exporting
model.to("cuda:0")
it has a error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
or
torch.jit._trace.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
because https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/zipformer.py#L2280-L2281
,but it modify:

rows = torch.arange(start=time1 - 1, end=-1, step=-1)
rows = torch.arange(start=time1 - 1, end=-1, step=-1).cuda()

2、when use cpu to export. the sherpa online[https://github.com/k2-fsa/sherpa/blob/master/sherpa/cpp_api/bin/online-recognizer.cc] use gpu inference, it had a error
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

so, the best method is to modify the encode export style. using torch.jit.script.

@yaozengwei
Copy link
Collaborator

Could you successfully export the model if you do the change rows = torch.arange(start=time1 - 1, end=-1, step=-1).cuda()?

The reason why we export with jit.trace instead of jit.script is some inference frameworks need that.

@whaozl
Copy link
Author

whaozl commented Mar 23, 2023

when use rows = torch.arange(start=time1 - 1, end=-1, step=-1).cuda(), it failed.

so I try to use torch.jit.script for the encode model. then, use sherpa online. it can run successfullly when use_gpu.

@yaozengwei
Copy link
Collaborator

when use rows = torch.arange(start=time1 - 1, end=-1, step=-1).cuda(), it failed.

so I try to use torch.jit.script for the encode model. then, use sherpa online. it can run successfullly when use_gpu.

Ok. The exported encoder that you are running on cuda device is jit.script version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants