You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The sz_tolower function requires copying to another buffer. Within find and find_byte routines a lowercasing step can be done quickly in a few extra cycles. I'm happy to add the feature, just gauging interest here.
For avx512
SZ_INTERNAL__m512isz_lower_avx512(__m512iin)
{
__m512iA=_mm512_set1_epi8('A');
__m512iZ=_mm512_set1_epi8('Z');
__m512ito_lower=_mm512_set1_epi8('a'-'A');
__mmask64ge_A=_mm512_cmpge_epi8_mask(in, A);
__mmask64le_Z=_mm512_cmple_epi8_mask(in, Z);
__mmask64is_upper=_kand_mask64(ge_A, le_Z);
return_mm512_mask_add_epi8(in, is_upper, in, to_lower);
}
SZ_PUBLICsz_cptr_tsz_find_byte_case_insensitive_avx512(sz_cptr_th, sz_size_th_length, sz_cptr_tn)
{
__mmask64mask;
sz_u512_vec_th_vec, n_vec;
/// PATCH!!!n_vec.zmm=_mm512_set1_epi8(sz_u8_tolower(n[0]));
/// PATCH!!!while (h_length >= 64) {
/// PATCH!!!h_vec.zmm=sz_lower_avx512(_mm512_loadu_si512(h));
/// PATCH!!!mask=_mm512_cmpeq_epi8_mask(h_vec.zmm, n_vec.zmm);
if (mask)
returnh+sz_u64_ctz(mask);
h+=64, h_length-=64;
}
if (h_length) {
mask=_sz_u64_mask_until(h_length);
/// PATCH!!!h_vec.zmm=sz_lower_avx512(_mm512_maskz_loadu_epi8(mask, h));
/// PATCH!!!// Reuse the same `mask` variable to find the bit that doesn't matchmask=_mm512_mask_cmpeq_epu8_mask(mask, h_vec.zmm, n_vec.zmm);
if (mask)
returnh+sz_u64_ctz(mask);
}
returnSZ_NULL_CHAR;
}
Can you contribute to the implementation?
I can contribute
Is your feature request specific to a certain interface?
I've considered this before, and it raises questions about the character set encodings. The suggested lowering function limits the applicability to ASCII content. There is probably a better way to future-proof the API for UTF8 as well. Feel free to open a PR, and I'll modify/integrate it down the road when we get to it.
Describe what you are looking for
The
sz_tolower
function requires copying to another buffer. Withinfind
andfind_byte
routines a lowercasing step can be done quickly in a few extra cycles. I'm happy to add the feature, just gauging interest here.For avx512
Can you contribute to the implementation?
Is your feature request specific to a certain interface?
It applies to everything
Contact Details
[email protected]
Is there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: