Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with IPEX v2.3.110 #718

Open
byclear opened this issue Oct 4, 2024 · 37 comments
Open

issue with IPEX v2.3.110 #718

byclear opened this issue Oct 4, 2024 · 37 comments
Assignees
Labels
ARC ARC GPU Bug Something isn't working Crash Execution crashes Escalate Windows

Comments

@byclear
Copy link

byclear commented Oct 4, 2024

Describe the bug

C:\Users\clear\miniconda3\envs\310\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Users\clear\miniconda3\envs\310\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
Collecting environment information...
Traceback (most recent call last):
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 517, in
main()
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 512, in main
output = get_pretty_env_info()
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 507, in get_pretty_env_info
return pretty_str(get_env_info())
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 368, in get_env_info
xpu_available_str = str(torch.xpu.is_available())
File "C:\Users\clear\miniconda3\envs\310\lib\site-packages\torch\xpu_init_.py", line 63, in is_available
return device_count() > 0
File "C:\Users\clear\miniconda3\envs\310\lib\site-packages\torch\xpu_init_.py", line 57, in device_count
return torch._C._xpu_getDeviceCount()
RuntimeError: Can't add devices across platforms to a single context. -33 (PI_ERROR_INVALID_DEVICE)

conda install libuv
python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/lnl/cn/

Versions

C:\Users\clear\miniconda3\envs\310\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Users\clear\miniconda3\envs\310\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
Collecting environment information...
Traceback (most recent call last):
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 517, in
main()
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 512, in main
output = get_pretty_env_info()
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 507, in get_pretty_env_info
return pretty_str(get_env_info())
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 368, in get_env_info
xpu_available_str = str(torch.xpu.is_available())
File "C:\Users\clear\miniconda3\envs\310\lib\site-packages\torch\xpu_init_.py", line 63, in is_available
return device_count() > 0
File "C:\Users\clear\miniconda3\envs\310\lib\site-packages\torch\xpu_init_.py", line 57, in device_count
return torch._C._xpu_getDeviceCount()
RuntimeError: Can't add devices across platforms to a single context. -33 (PI_ERROR_INVALID_DEVICE)

@ZailiWang ZailiWang self-assigned this Oct 5, 2024
@ZailiWang
Copy link
Contributor

Hi, let me have a check and get back to you. Thanks.

@byclear
Copy link
Author

byclear commented Oct 5, 2024

Hi, let me have a check and get back to you. Thanks.

好的。麻烦你看看。谢谢

@ZailiWang
Copy link
Contributor

Hi, are you sure you are using an integrated Arc in Meteor Lake or Lunar Lake processors, or a discrete Arc device? Please be aware the pip package urls are different for these devices in the installation guide

@byclear
Copy link
Author

byclear commented Oct 9, 2024

Hi, are you sure you are using an integrated Arc in Meteor Lake or Lunar Lake processors, or a discrete Arc device? Please be aware the pip package urls are different for these devices in the installation guide

I am certain that the device I am using is the Intel A770M. I used 2.1.40 without any problem. But when I use 2.3.110, I get an error. I am using NUC12. Viper Canyon

@byclear
Copy link
Author

byclear commented Oct 9, 2024

主板 Intel Corporation
型号 NUC12SNKi72
版本 M45201-502
操作系统 Microsoft Windows 11 企业版 LTSC (64 位)
版本 24H2 (10.0.26100)
版本(内部版本号)

设备和驱动程序
处理器 12th Gen Intel® Core™ i7-12700H

显卡
Intel® Iris® Xe Graphics
Intel® Arc™ A770M Graphics

@ZailiWang
Copy link
Contributor

The extra-index-url argument varies for discrete/integrated arc graphics. In your 2.3.110 installation step you used a url for Lunar Lake, which is not available when 2.1.40 released, so it's a bit confusing.

A NUC12 should have a discrete A770M card in it, so please try to change the url to

--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/

@byclear
Copy link
Author

byclear commented Oct 9, 2024

参数extra-index-url因离散/集成弧形图形而异。在 2.3.110 安装步骤中,您使用了 Lunar Lake 的 URL,但在 2.1.40 发布时该 URL 不可用,因此有点令人困惑。

NUC12 应该有独立的 A770M 卡,因此请尝试将 URL 更改为

--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\clear\miniconda3\envs\torch\lib\site-packages\torch_init_.py", line 143, in
raise err
OSError: [WinError 126] 找不到指定的模块。 Error loading "C:\Users\clear\miniconda3\envs\torch\lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.

我要又要重复 2.1.40了嘛 one api嘛。我还是想使用预编译的。看看有没有会比较稳定和快。。。 one api 都停止维护了

@ZailiWang
Copy link
Contributor

啊不是。。我也中文吧.就是你还是按2.3.110的安装文档来,只不过python -m pip install 装ipex时留意下把最后那个extra-index-url 参数改成我上边说的那个(而不是你主帖里的/lnl/us 结尾那个)

@byclear
Copy link
Author

byclear commented Oct 9, 2024

python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/

你贴一下完整代码可以嘛。好大哥。我是这个

啊不是。。我也中文吧.就是你还是按2.3.110的安装文档来,只不过python -m pip install 装ipex时留意下把最后那个extra-index-url 参数改成我上边说的那个(而不是你主帖里的/lnl/us 结尾那个)

@byclear
Copy link
Author

byclear commented Oct 9, 2024

我装完就是报错了没有DLL。因为我的系统重新装了。之前的系统装oneapi。确实可以用的。现在一直反复好像找不到GPU。按照文档安装 遇到RuntimeError: Can't add devices across platforms to a single context. -33 (PI_ERROR_INVALID_DEVICE) 错误。按照你给的url安装 遇到缺失DLL OSError: [WinError 126] 找不到指定的模块。 Error loading "C:\Users\clear\miniconda3\envs\torch\lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.

@byclear
Copy link
Author

byclear commented Oct 9, 2024

这个问题。似乎能解决的唯一路径。就是安装oneapi 并且环境变量配置oneapi的dll 也能解决这个问题。。但是想用有没有编译好的.看看速度有没有更快。

@ZailiWang
Copy link
Contributor

你的意思是重装完系统,安装2.3.110时,如果不单独装oneAPI, 还是会报错?即使在修改了extra-index-url 后也还一样?

@jingxu10
Copy link
Contributor

jingxu10 commented Oct 9, 2024

我装完就是报错了没有DLL。因为我的系统重新装了。之前的系统装oneapi。确实可以用的。现在一直反复好像找不到GPU。按照文档安装 遇到RuntimeError: Can't add devices across platforms to a single context. -33 (PI_ERROR_INVALID_DEVICE) 错误。按照你给的url安装 遇到缺失DLL OSError: [WinError 126] 找不到指定的模块。 Error loading "C:\Users\clear\miniconda3\envs\torch\lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.

你跑过conda install libuv吗?

@byclear
Copy link
Author

byclear commented Oct 9, 2024

哦 不好意思,确实没有跑conda install libuv ,运行完之后还是这个问题。C:\Userslear\miniconda3\envs\pytorch\lib\site-packages\intel_extension_for_pytorch\llm_init__. py:9: UserWarning: failed to use huggingface generation fuctions due to: 没有名为 'transformers' 的模块。
warnings.warn(f “failed to use huggingface generation fuctions due to: {e}.”)
2.3.1+cxx11.abi
2.3.110+xpu
回溯(最近调用):
文件 “”, 第 1 行, 在 中
File “C:\Users\clear\miniconda3\envs\pytorch\libsite-packages\torch\xpu_init_.py”, line 57, in device_count
return torch._C._xpu_getDeviceCount()
运行时错误: 无法在单个上下文中添加跨平台的设备。-33 (pi_error_invalid_device)
我的命令
conda install libuv
python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/

@byclear
Copy link
Author

byclear commented Oct 9, 2024

Collecting environment information...
PyTorch version: 2.1.0.post3+cxx11.abi
PyTorch CXX11 ABI: No
IPEX version: 2.1.40+xpu
IPEX commit: 80ed476
Build type: Release

OS: N/A
GCC version: N/A
Clang version: N/A
IGC version: N/A
CMake version: N/A
Libc version: N/A

Python version: 3.10.15 | packaged by Anaconda, Inc. | (main, Oct 3 2024, 07:22:19) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.26100-SP0
Is XPU available: True
DPCPP runtime version: N/A
MKL version: N/A
GPU models and configuration:
[0] _DeviceProperties(name='Intel(R) Iris(R) Xe Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu', driver_version='1.3.30714', has_fp64=0, total_memory=30149MB, max_compute_units=96, gpu_eu_count=96)
[1] _DeviceProperties(name='Intel(R) Arc(TM) A770M Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu', driver_version='1.3.30714', has_fp64=0, total_memory=15930MB, max_compute_units=512, gpu_eu_count=512)
Intel OpenCL ICD version: N/A
Level Zero version: N/A

CPU:
'wmic' 不是内部或外部命令,也不是可运行的程序
或批处理文件。

Versions of relevant libraries:
[pip3] intel_extension_for_pytorch==2.1.40+xpu
[pip3] numpy==1.26.4
[pip3] torch==2.1.0.post3+cxx11.abi
[pip3] torchaudio==2.1.0.post3+cxx11.abi
[pip3] torchvision==0.16.0.post3+cxx11.abi
[conda] intel-extension-for-pytorch 2.1.40+xpu pypi_0 pypi
[conda] mkl 2024.2.1 pypi_0 pypi
[conda] mkl-dpcpp 2024.2.1 pypi_0 pypi
[conda] numpy 1.26.4 pypi_0 pypi
[conda] onemkl-sycl-blas 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-datafitting 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-dft 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-lapack 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-rng 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-sparse 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-stats 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-vm 2024.2.1 pypi_0 pypi
[conda] torch 2.1.0.post3+cxx11.abi pypi_0 pypi
[conda] torchaudio 2.1.0.post3+cxx11.abi pypi_0 pypi
[conda] torchvision 0.16.0.post3+cxx11.abi pypi_0 pypi
当我降级安装 2.1.40 是可以读取到XPU的

@ZailiWang
Copy link
Contributor

好的,我们再看看这个问题啊

@jingxu10 jingxu10 added ARC ARC GPU Crash Execution crashes Windows labels Oct 10, 2024
@ZailiWang
Copy link
Contributor

Hi @byclear 再确认一下啊,你装驱动那一步是装的红框的那个吗?
image
然后你昨天试验回退2.1.40的时候,切换ipex版本中间也没有重装过驱动是吧?

@byclear
Copy link
Author

byclear commented Oct 10, 2024

Yes, I have not reinstalled the drivers. I'm using 32.0.101.6079 WHQL. I'll test it by installing the specified version 32.0.101.5768 now. Reply later with the results. Thanks for the help. @ZailiWang

@byclear
Copy link
Author

byclear commented Oct 10, 2024

我卸载了 32.0.101.6079 并重启系统后,安装 32.0.101.5768。安装完后重启系统。运行如下命令:conda install libuv
python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/

错误依然是
收集环境信息...
Traceback (most recent call last):
文件 “C:\pythonxm\测试ARC显卡\collect_env.py”, 行 516, 在 中
main()
File “C:pythonxm\ Test ARC 显卡\collect_env.py”, line 511, in main
output = get_pretty_env_info()
文件 “C:pythonxm\ 测试 ARC 显卡\collect_env.py”, 第 506 行, 在 get_pretty_env_info 中
return pretty_str(get_env_info())
文件 “C:pythonxm\ 测试 ARC 显卡\collect_env.py”, 第 367 行, 在 get_env_info 中
xpu_available_str = str(torch.xpu.is_available())
File “C:\Users\clear\miniconda3\envs\pytorch\libsite-packages\torch\xpu_init_.py”, line 63, in is_available
return device_count() > 0
File “C:\Users\clear\miniconda3\envs\pytorch\libsite-packages\torch\xpu_init_.py”, line 57, in device_count
return torch._C._xpu_getDeviceCount()
运行时错误: 无法在单个上下文中添加跨平台的设备。-33 (pi_error_invalid_device)

@ZailiWang

@ZailiWang
Copy link
Contributor

好的,能否再帮忙确认下现在回退到2.1.40是否能正常运行了

@byclear
Copy link
Author

byclear commented Oct 10, 2024

Collecting environment information...
PyTorch version: 2.1.0.post3+cxx11.abi
PyTorch CXX11 ABI: No
IPEX version: 2.1.40+xpu
IPEX commit: 80ed476
Build type: Release

OS: N/A
GCC version: N/A
Clang version: N/A
IGC version: N/A
CMake version: N/A
Libc version: N/A

Python version: 3.10.15 | packaged by Anaconda, Inc. | (main, Oct 3 2024, 07:22:19) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.26100-SP0
Is XPU available: True
DPCPP runtime version: N/A
MKL version: N/A
GPU models and configuration:
[0] _DeviceProperties(name='Intel(R) Iris(R) Xe Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu', driver_version='1.3.29803', has_fp64=0, total_memory=30149MB, max_compute_units=96, gpu_eu_count=96)
[1] _DeviceProperties(name='Intel(R) Arc(TM) A770M Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu', driver_version='1.3.29803', has_fp64=0, total_memory=15930MB, max_compute_units=512, gpu_eu_count=512)
Intel OpenCL ICD version: N/A
Level Zero version: N/A

CPU:
'wmic' 不是内部或外部命令,也不是可运行的程序
或批处理文件。

Versions of relevant libraries:
[pip3] intel_extension_for_pytorch==2.1.40+xpu
[pip3] numpy==2.1.2
[pip3] torch==2.1.0.post3+cxx11.abi
[pip3] torchaudio==2.1.0.post3+cxx11.abi
[pip3] torchvision==0.16.0.post3+cxx11.abi
[conda] intel-extension-for-pytorch 2.1.40+xpu pypi_0 pypi
[conda] mkl 2024.2.1 pypi_0 pypi
[conda] mkl-dpcpp 2024.2.1 pypi_0 pypi
[conda] numpy 2.1.2 pypi_0 pypi
[conda] onemkl-sycl-blas 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-datafitting 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-dft 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-lapack 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-rng 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-sparse 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-stats 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-vm 2024.2.1 pypi_0 pypi
[conda] torch 2.1.0.post3+cxx11.abi pypi_0 pypi
[conda] torchaudio 2.1.0.post3+cxx11.abi pypi_0 pypi
[conda] torchvision 0.16.0.post3+cxx11.abi pypi_0 pypi

可以运行 没有报错。这是详细的检测报告 @ZailiWang

@ZailiWang
Copy link
Contributor

感谢,我再找内部的人问问

@ZailiWang
Copy link
Contributor

麻烦再帮着测下如果在设备管理器里disable掉集显,就是这个

[0] _DeviceProperties(name='Intel(R) Iris(R) Xe Graphics'

然后 2.3.110在arc770m上是不是就不报错了。设备管理器里找到这个集显设备然后 右键->禁用 应该就可以。

@ZailiWang
Copy link
Contributor

如果禁用集显后就能用arc770了,那暂时先这么用吧,我们赶紧想办法修这个bug. 造成不便,非常抱歉~

@byclear
Copy link
Author

byclear commented Oct 11, 2024

Collecting environment information...
PyTorch version: 2.3.1+cxx11.abi
PyTorch CXX11 ABI: No
IPEX version: 2.3.110+xpu
IPEX commit: 95c9459
Build type: Release

OS: N/A
GCC version: N/A
Clang version: N/A
IGC version: N/A
CMake version: N/A
Libc version: N/A

Python version: 3.10.15 | packaged by Anaconda, Inc. | (main, Oct 3 2024, 07:22:19) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.26100-SP0
Is XPU available: True
DPCPP runtime version: N/A
MKL version: N/A
GPU models and configuration:
[0] _XpuDeviceProperties(name='Intel(R) Arc(TM) A770M Graphics', platform_name='Intel(R) Level-Zero', type='gpu', driver_version='1.3.29803', total_memory=15930MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)
Intel OpenCL ICD version: N/A
Level Zero version: N/A

CPU:
'wmic' 不是内部或外部命令,也不是可运行的程序
或批处理文件。

Versions of relevant libraries:
[pip3] intel_extension_for_pytorch==2.3.110+xpu
[pip3] numpy==2.1.2
[pip3] torch==2.3.1+cxx11.abi
[pip3] torchaudio==2.3.1+cxx11.abi
[pip3] torchvision==0.18.1+cxx11.abi
[conda] intel-extension-for-pytorch 2.3.110+xpu pypi_0 pypi
[conda] mkl 2024.2.1 pypi_0 pypi
[conda] mkl-dpcpp 2024.2.1 pypi_0 pypi
[conda] numpy 2.1.2 pypi_0 pypi
[conda] onemkl-sycl-blas 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-datafitting 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-dft 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-lapack 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-rng 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-sparse 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-stats 2024.2.1 pypi_0 pypi
[conda] onemkl-sycl-vm 2024.2.1 pypi_0 pypi
[conda] torch 2.3.1+cxx11.abi pypi_0 pypi
[conda] torchaudio 2.3.1+cxx11.abi pypi_0 pypi
[conda] torchvision 0.18.1+cxx11.abi pypi_0 pypi

确实哦。禁用集显就能用了

@ZailiWang ZailiWang added the Bug Something isn't working label Oct 11, 2024
@ZurrTum
Copy link

ZurrTum commented Oct 12, 2024

如果禁用集显后就能用arc770了,那暂时先这么用吧,我们赶紧想办法修这个bug. 造成不便,非常抱歉~

@ZailiWang 按照这个方式,有些显示器可能跟独显不适配,导致只能使用 Microsoft 基本显示驱动程序。
不适合长期禁用,显卡输出的帧率与显示器的刷新率不同步会导致画面割裂

处理器 13th Gen Intel(R) Core(TM) i9-13900H,2600 Mhz,14 个内核,20 个逻辑处理器
适配器类型 Intel(R) Iris(R) Xe Graphics Family, Intel Corporation 兼容
适配器类型 Intel(R) Arc(TM) A370M Graphics Family, Intel Corporation 兼容

核显支持 100赫兹 2560x1440 ,独显只能 64赫兹 2560x1440
监视器 Integrated Monitor (LEN-A570-A-C)
系统 SKU LENOVO_MT_F0GQ_BU_Lenovo_FM_XiaoXinPro 27-IRH

Python 3.12.4 | packaged by Anaconda, Inc.
intel_extension_for_pytorch 2.3.110+gitccf9c15
torch 2.3.0a0+git63d5e92
驱动版本 32.0.101.6083_101.5736 (Latest 10/7/2024)

2.3.110 版本 -33 (PI_ERROR_INVALID_DEVICE) 错误,禁用核显或者回退到 2.1.40+xpu 版本正常

@ZailiWang
Copy link
Contributor

已经确认是bug了,后边会修复的。禁掉集显只是一个修复前暂时绕过去的方法。

@DurianyDoriana
Copy link

已经确认是bug了,后边会修复的。禁掉集显只是一个修复前暂时绕过去的方法。

这是为什么 IntelAI 1.21b 版本在 ARC GPU 和 iGPU 上存在问题的原因吗?

它也使用 2.3.110+xpu 吗?

我提交了以下错误报告:
intel/AI-Playground#76

@ZailiWang
Copy link
Contributor

Hi, the issue has been fixed. Would you retry with re-installation

python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ --force-reinstall

and check if the issue is resolved at your side. Thanks!

@byclear
Copy link
Author

byclear commented Oct 30, 2024

Hi, the issue has been fixed. Would you retry with re-installation

python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ --force-reinstall

and check if the issue is resolved at your side. Thanks!

Collecting environment information...
Traceback (most recent call last):
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 516, in
main()
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 511, in main
output = get_pretty_env_info()
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 506, in get_pretty_env_info
return pretty_str(get_env_info())
File "C:\pythonxm\测试ARC显卡\collect_env.py", line 367, in get_env_info
xpu_available_str = str(torch.xpu.is_available())
File "C:\Users\clear\miniconda3\envs\xpu\lib\site-packages\torch\xpu_init_.py", line 63, in is_available
return device_count() > 0
File "C:\Users\clear\miniconda3\envs\xpu\lib\site-packages\torch\xpu_init_.py", line 57, in device_count
return torch._C._xpu_getDeviceCount()
RuntimeError: Can't add devices across platforms to a single context. -33 (PI_ERROR_INVALID_DEVICE)

一样的啊。贴错命令行了?

@byclear
Copy link
Author

byclear commented Oct 30, 2024

annotated-types 0.7.0 pypi_0 pypi
bzip2 1.0.8 h2bbff1b_6
ca-certificates 2024.9.24 haa95532_0
dpcpp-cpp-rt 2024.2.1 pypi_0 pypi
filelock 3.16.1 pypi_0 pypi
fsspec 2024.10.0 pypi_0 pypi
intel-cmplr-lib-rt 2024.2.1 pypi_0 pypi
intel-cmplr-lib-ur 2024.2.1 pypi_0 pypi
intel-cmplr-lic-rt 2024.2.1 pypi_0 pypi
intel-extension-for-pytorch 2.3.110+xpu pypi_0 pypi
intel-opencl-rt 2024.2.1 pypi_0 pypi
intel-openmp 2024.2.1 pypi_0 pypi
intel-sycl-rt 2024.2.1 pypi_0 pypi
jinja2 3.1.4 pypi_0 pypi
libffi 3.4.4 hd77b12b_1
libuv 1.48.0 h827c3e9_0
markupsafe 3.0.2 pypi_0 pypi
mkl 2024.2.1 pypi_0 pypi
mkl-dpcpp 2024.2.1 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
networkx 3.4.2 pypi_0 pypi
numpy 2.1.2 pypi_0 pypi
onemkl-sycl-blas 2024.2.1 pypi_0 pypi
onemkl-sycl-datafitting 2024.2.1 pypi_0 pypi
onemkl-sycl-dft 2024.2.1 pypi_0 pypi
onemkl-sycl-lapack 2024.2.1 pypi_0 pypi
onemkl-sycl-rng 2024.2.1 pypi_0 pypi
onemkl-sycl-sparse 2024.2.1 pypi_0 pypi
onemkl-sycl-stats 2024.2.1 pypi_0 pypi
onemkl-sycl-vm 2024.2.1 pypi_0 pypi
openssl 3.0.15 h827c3e9_0
packaging 24.1 pypi_0 pypi
pillow 11.0.0 pypi_0 pypi
pip 24.2 py310haa95532_0
psutil 6.1.0 pypi_0 pypi
pydantic 2.9.2 pypi_0 pypi
pydantic-core 2.23.4 pypi_0 pypi
python 3.10.15 h4607a30_1
ruamel-yaml 0.18.6 pypi_0 pypi
ruamel-yaml-clib 0.2.12 pypi_0 pypi
setuptools 75.1.0 py310haa95532_0
sqlite 3.45.3 h2bbff1b_0
sympy 1.13.3 pypi_0 pypi
tbb 2021.13.1 pypi_0 pypi
tk 8.6.14 h0416ee5_0
torch 2.3.1+cxx11.abi pypi_0 pypi
torchaudio 2.3.1+cxx11.abi pypi_0 pypi
torchvision 0.18.1+cxx11.abi pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2024b h04d1e81_0
vc 14.40 h2eaa2aa_1
vs2015_runtime 14.40.33807 h98bb1dd_1
wheel 0.44.0 py310haa95532_0
xz 5.4.6 h8cc25b3_1
zlib 1.2.13 h8cc25b3_1

运行即报错

@ZailiWang
Copy link
Contributor

抱歉,我得到的信息有误,这个问题的修复还没有正式更新到目前的安装包里。等修复确认发布出来了我再来告知哈。

@Nuullll
Copy link

Nuullll commented Nov 4, 2024

@ZailiWang Did you post the wrong command accidentally? I tried the latest v2.3.110 hotfix (post0), it worked.

python -m pip install torch==2.3.1.post0+cxx11.abi torchvision==0.18.1.post0+cxx11.abi torchaudio==2.3.1.post0+cxx11.abi intel-extension-for-pytorch==2.3.110.post0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/

python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
C:\Users\vfirs\miniforge3\envs\arc311\Lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Users\vfirs\miniforge3\envs\arc311\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
C:\Users\vfirs\miniforge3\envs\arc311\Lib\site-packages\intel_extension_for_pytorch\llm\__init__.py:9: UserWarning: failed to use huggingface generation fuctions due to: No module named 'transformers'.
  warnings.warn(f"failed to use huggingface generation fuctions due to: {e}.")
2.3.1.post0+cxx11.abi
2.3.110.post0+xpu
[0]: _XpuDeviceProperties(name='Intel(R) UHD Graphics 770', platform_name='Intel(R) Level-Zero', type='gpu', driver_version='1.3.30398', total_memory=14829MB, max_compute_units=32, gpu_eu_count=32, gpu_subslice_count=4, max_work_group_size=512, max_num_sub_groups=64, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)
[1]: _XpuDeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) Level-Zero', type='gpu', driver_version='1.3.30398', total_memory=15930MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)
[2]: _XpuDeviceProperties(name='Intel(R) Arc(TM) A750 Graphics', platform_name='Intel(R) Level-Zero', type='gpu', driver_version='1.3.30398', total_memory=7934MB, max_compute_units=448, gpu_eu_count=448, gpu_subslice_count=56, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)

@ZailiWang
Copy link
Contributor

Yeah, the hotfix release for this issue has just released. Thanks for confirmation. @byclear Please check again by re-installing the packages, this time it should work.

@byclear
Copy link
Author

byclear commented Nov 6, 2024

2.3.1.post0+cxx11.abi
2.3.110.post0+xpu
Traceback (most recent call last):
File "C:\pythonxm\测试ARC显卡\test.py", line 38, in
print(torch.xpu.empty_cache())
File "C:\Users\clear\miniconda3\envs\xpu\lib\site-packages\intel_extension_for_pytorch\xpu\memory.py", line 21, in empty_cache
intel_extension_for_pytorch._C._emptyCache()
RuntimeError: Queue cannot be constructed with the given context and device since the device is neither a member of the context nor a descendant of its member. -33 (PI_ERROR_INVALID_DEVICE)

相关函数仍然是报错的。尤其是我想指定GPU
t = torch.Tensor([1., 2.])
print(t.to("xpu:1"))
Traceback (most recent call last):
File "C:\pythonxm\测试ARC显卡\test.py", line 41, in
print(t.to("xpu:1"))
RuntimeError: Queue cannot be constructed with the given context and device since the device is neither a member of the context nor a descendant of its member. -33 (PI_ERROR_INVALID_DEVICE)
请尽管解决。。。

@ZailiWang
Copy link
Contributor

抱歉,看来这个问题还是没有真正解决。我反馈了,会尽快继续修复。
目前如果不想在设备管理器里禁掉iGPU,可以在运行ipex程序前先设置环境变量

ONEAPI_DEVICE_SELECTOR=*:1

这样也可以先把这个问题规避过去。

@byclear
Copy link
Author

byclear commented Nov 6, 2024

import os
os.environ["ONEAPI_DEVICE_SELECTOR"] = "*:1"

好的 谢谢了。他成功的跑起来了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARC ARC GPU Bug Something isn't working Crash Execution crashes Escalate Windows
Projects
None yet
Development

No branches or pull requests

6 participants