[QST]compiling error when use cutlass::arch::OpClassSimt in cutlass::gemm::kernel::DefaultGemmGrouped #1973

alephchang · 2024-12-06T09:55:06Z

I am trying GemmGrouped in cutlass. with cutlass::arch::OpClassTensorOp, the following codes can be compiled successfully. but with cutlass::arch::OpClassSimt, there is compiling error.
The reason why I try OpClassSimt is that I need do some grouped Gemm on sm70, with double precision. Currently, I cannot compile with OpClassSimt+sm80. Thanks.

//using MMAOp = cutlass::arch::OpClassSimt; //compiling error with this
using MMAOp = cutlass::arch::OpClassTensorOp; //compile successfully

using ElemType = double;
using ElementA = ElemType  ;
using ElementB = ElemType  ;
using ElementOutput = ElemType  ;
using ElementAccumulator = double;
using LayoutA = cutlass::layout::ColumnMajor;
using LayoutB = cutlass::layout::ColumnMajor;
using LayoutC = cutlass::layout::ColumnMajor;

using Gemm_8 = typename cutlass::gemm::kernel::DefaultGemmGrouped<
    ElementA, LayoutA, cutlass::ComplexTransform::kNone, 1, ElementB, LayoutB, cutlass::ComplexTransform::kNone, 1, ElementOutput, LayoutC,
    ElementAccumulator, MMAOp, cutlass::arch::Sm80, 
    cutlass::gemm::GemmShape<32, 32, 16>, cutlass::gemm::GemmShape<16, 16, 16>, cutlass::gemm::GemmShape<8, 8, 4>,
    cutlass::epilogue::thread::LinearCombination<
    ElementOutput, 64 / cutlass::sizeof_bits<ElementOutput>::value,
    ElementAccumulator, ElementAccumulator>,
    cutlass::gemm::threadblock::GemmBatchedIdentityThreadblockSwizzle,
    4>::GemmKernel;

compiling error for cutlass::arch::OpClassSimt

cutlass-main/include/cutlass/gemm/kernel/default_gemm_grouped.h(214): error: incomplete type is not allowed
    using DefaultGemmKernel = typename kernel::DefaultGemm<
                                       ^
          detected during instantiation of class "cutlass::gemm::kernel::DefaultGemmGrouped<ElementA, LayoutA, cutlass::ComplexTransform::kNone, kAlignmentA, ElementB, LayoutB, cutlass::ComplexTransform::kNone, kAlignmentB, ElementC, LayoutC, ElementAccumulator, OperatorClass, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, GroupScheduleMode_, Operator, SharedMemoryClear, PermuteDLayout, std::enable_if<<expression>, void>::type> [with ElementA=gs::ElementA, LayoutA=gs::LayoutA, kAlignmentA=1, ElementB=gs::ElementB, LayoutB=gs::LayoutB, kAlignmentB=1, ElementC=gs::ElementOutput, LayoutC=gs::LayoutC, ElementAccumulator=gs::ElementAccumulator, OperatorClass=gs::MMAOp, ArchTag=cutlass::arch::Sm80, ThreadblockShape=cutlass::gemm::GemmShape<32, 32, 16>, WarpShape=cutlass::gemm::GemmShape<16, 16, 16>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue::thread::LinearCombination<gs::ElementOutput, 1, gs::ElementAccumulator, gs::ElementAccumulator, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, gs::ElementOutput>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmBatchedIdentityThreadblockSwizzle, Stages=4, GroupScheduleMode_=cutlass::gemm::kernel::GroupScheduleMode::kDeviceOnly, Operator=cutlass::arch::OpMultiplyAdd, SharedMemoryClear=cutlass::gemm::SharedMemoryClearOption::kNone, PermuteDLayout=cutlass::layout::NoPermute]" at line 45 of xx.cu

cutlass/gemm/kernel/default_gemm_grouped.h(243): error: a class or namespace qualified name is required
      typename DefaultGemmKernel::Mma,
               ^
          detected during instantiation of class "cutlass::gemm::kernel::DefaultGemmGrouped<ElementA, LayoutA, cutlass::ComplexTransform::kNone, kAlignmentA, ElementB, LayoutB, cutlass::ComplexTransform::kNone, kAlignmentB, ElementC, LayoutC, ElementAccumulator, OperatorClass, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, GroupScheduleMode_, Operator, SharedMemoryClear, PermuteDLayout, std::enable_if<<expression>, void>::type> [with ElementA=gs::ElementA, LayoutA=gs::LayoutA, kAlignmentA=1, ElementB=gs::ElementB, LayoutB=gs::LayoutB, kAlignmentB=1, ElementC=gs::ElementOutput, LayoutC=gs::LayoutC, ElementAccumulator=gs::ElementAccumulator, OperatorClass=gs::MMAOp, ArchTag=cutlass::arch::Sm80, ThreadblockShape=cutlass::gemm::GemmShape<32, 32, 16>, WarpShape=cutlass::gemm::GemmShape<16, 16, 16>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue::thread::LinearCombination<gs::ElementOutput, 1, gs::ElementAccumulator, gs::ElementAccumulator, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, gs::ElementOutput>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmBatchedIdentityThreadblockSwizzle, Stages=4, GroupScheduleMode_=cutlass::gemm::kernel::GroupScheduleMode::kDeviceOnly, Operator=cutlass::arch::OpMultiplyAdd, SharedMemoryClear=cutlass::gemm::SharedMemoryClearOption::kNone, PermuteDLayout=cutlass::layout::NoPermute]" at line 45 of xx.cu

cutlass/gemm/kernel/default_gemm_grouped.h(243): error: too few arguments for class template "cutlass::gemm::kernel::GemmGrouped"
      typename DefaultGemmKernel::Mma,
                                ^
          detected during instantiation of class "cutlass::gemm::kernel::DefaultGemmGrouped<ElementA, LayoutA, cutlass::ComplexTransform::kNone, kAlignmentA, ElementB, LayoutB, cutlass::ComplexTransform::kNone, kAlignmentB, ElementC, LayoutC, ElementAccumulator, OperatorClass, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, GroupScheduleMode_, Operator, SharedMemoryClear, PermuteDLayout, std::enable_if<<expression>, void>::type> [with ElementA=gs::ElementA, LayoutA=gs::LayoutA, kAlignmentA=1, ElementB=gs::ElementB, LayoutB=gs::LayoutB, kAlignmentB=1, ElementC=gs::ElementOutput, LayoutC=gs::LayoutC, ElementAccumulator=gs::ElementAccumulator, OperatorClass=gs::MMAOp, ArchTag=cutlass::arch::Sm80, ThreadblockShape=cutlass::gemm::GemmShape<32, 32, 16>, WarpShape=cutlass::gemm::GemmShape<16, 16, 16>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue::thread::LinearCombination<gs::ElementOutput, 1, gs::ElementAccumulator, gs::ElementAccumulator, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, gs::ElementOutput>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmBatchedIdentityThreadblockSwizzle, Stages=4, GroupScheduleMode_=cutlass::gemm::kernel::GroupScheduleMode::kDeviceOnly, Operator=cutlass::arch::OpMultiplyAdd, SharedMemoryClear=cutlass::gemm::SharedMemoryClearOption::kNone, PermuteDLayout=cutlass::layout::NoPermute]" at line 45 of xx.cu

cutlass-main/include/cutlass/gemm/kernel/default_gemm_grouped.h(243): error: expected a ";"
      typename DefaultGemmKernel::Mma,
                                ^
          detected during instantiation of class "cutlass::gemm::kernel::DefaultGemmGrouped<ElementA, LayoutA, cutlass::ComplexTransform::kNone, kAlignmentA, ElementB, LayoutB, cutlass::ComplexTransform::kNone, kAlignmentB, ElementC, LayoutC, ElementAccumulator, OperatorClass, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, GroupScheduleMode_, Operator, SharedMemoryClear, PermuteDLayout, std::enable_if<<expression>, void>::type> [with ElementA=gs::ElementA, LayoutA=gs::LayoutA, kAlignmentA=1, ElementB=gs::ElementB, LayoutB=gs::LayoutB, kAlignmentB=1, ElementC=gs::ElementOutput, LayoutC=gs::LayoutC, ElementAccumulator=gs::ElementAccumulator, OperatorClass=gs::MMAOp, ArchTag=cutlass::arch::Sm80, ThreadblockShape=cutlass::gemm::GemmShape<32, 32, 16>, WarpShape=cutlass::gemm::GemmShape<16, 16, 16>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue::thread::LinearCombination<gs::ElementOutput, 1, gs::ElementAccumulator, gs::ElementAccumulator, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, gs::ElementOutput>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmBatchedIdentityThreadblockSwizzle, Stages=4, GroupScheduleMode_=cutlass::gemm::kernel::GroupScheduleMode::kDeviceOnly, Operator=cutlass::arch::OpMultiplyAdd, SharedMemoryClear=cutlass::gemm::SharedMemoryClearOption::kNone, PermuteDLayout=cutlass::layout::NoPermute]" at line 45 of xx.cu

The text was updated successfully, but these errors were encountered:

jackkosaian · 2024-12-06T17:52:33Z

SIMT kernels require an instruction shape of <1,1,1>. The shape <8,8,4> listed here will not be compatible.

You can try using the template parameters here in your example (but adjusting the data type to be double instead of float).

alephchang · 2024-12-07T01:27:45Z

Thanks, it is really helpful. Now I can build and run on sm70.

using Gemm_7 = typename cutlass::gemm::kernel::DefaultGemmGrouped<
    ElementA, LayoutA, cutlass::ComplexTransform::kNone, 1, ElementB, LayoutB, cutlass::ComplexTransform::kNone, 1, ElementOutput, LayoutC,
    ElementAccumulator, MMAOp, cutlass::arch::Sm70, 
    cutlass::gemm::GemmShape<8, 32, 8>, cutlass::gemm::GemmShape<8, 32, 8>, cutlass::gemm::GemmShape<1, 1, 1>,
    cutlass::epilogue::thread::LinearCombination<ElementOutput, 1, ElementAccumulator, ElementAccumulator>,
    cutlass::gemm::threadblock::GemmBatchedIdentityThreadblockSwizzle,
    2>::GemmKernel;

alephchang added ? - Needs Triage question Question labels Dec 6, 2024

alephchang closed this as completed Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST]compiling error when use cutlass::arch::OpClassSimt in cutlass::gemm::kernel::DefaultGemmGrouped #1973

[QST]compiling error when use cutlass::arch::OpClassSimt in cutlass::gemm::kernel::DefaultGemmGrouped #1973

alephchang commented Dec 6, 2024

jackkosaian commented Dec 6, 2024

alephchang commented Dec 7, 2024

[QST]compiling error when use cutlass::arch::OpClassSimt in cutlass::gemm::kernel::DefaultGemmGrouped #1973

[QST]compiling error when use cutlass::arch::OpClassSimt in cutlass::gemm::kernel::DefaultGemmGrouped #1973

Comments

alephchang commented Dec 6, 2024

jackkosaian commented Dec 6, 2024

alephchang commented Dec 7, 2024