Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CStringA::ReverseFind doesn't work as expected when compiled as 64 bit code

Tags:

c++

mfc

This simple code asserts (see comment in the code) when the code is compiled as 64 bit. When compiled as 32 bit it doesn't assert.

Actually the problem occurs when CStringA::ReverseFind is used to find a character whose most significant bit is 1.

The code below illustrates the problem: searching for | works, but searching for ¦ fails.

Strangely only CStringA::ReverseFind has this problem, CStringA::Find works fine.

Other information:

  • Windows 10
  • Visual Studio 2022 17.8.0
  • Platform Toolset Visual Studio 2022 (v143)
  • Windows SDK 10.0.22621.0
#include <afxwin.h>

void TEST()
{
  CStringW st0w(L"AB¦CD");
  CStringA st0a("AB¦CD");
  CStringW st1w(L"AB|CD");
  CStringA st1a("AB|CD");

  ASSERT(st1w.Find(L'|') == 2);
  ASSERT(st1a.Find('|') == 2);
  ASSERT(st0w.Find(L'¦') == 2);
  ASSERT(st0a.Find('¦') == 2);

  ASSERT(st1w.ReverseFind(L'|') == 2);
  ASSERT(st1a.ReverseFind('|') == 2);
  ASSERT(st0w.ReverseFind(L'¦') == 2);
  ASSERT(st0a.ReverseFind('¦') == 2);  // code asserts here
}

int main()
{
   TEST();
}

Is this a bug in MFC?

Has someone encountered this before?

like image 911
Jabberwocky Avatar asked Apr 16 '26 21:04

Jabberwocky


1 Answers

It is a bug, though not in MFC but in the underlying standard library implementation, specifically strrchr function.

Here's my non-MFC repro, still asserts as commented for x64:

#include <string.h>
#include <assert.h>

struct alignas(32) {
    char pad[8];
    wchar_t val[8] = L"AB¦CD";
} st0w;

struct alignas(32) {
    char pad[8];
    char val[8] = "AB¦CD";
} st0a;

struct alignas(32) {
    char pad[8];
    wchar_t val[8] = L"AB|CD";
} st1w;

struct alignas(32) {
    char pad[8];
    char val[8] = "AB|CD";
} st1a;

void test1()
{
    assert(wcschr(st1w.val, L'|') == st1w.val + 2);
    assert(strchr(st1a.val, (unsigned char)'|')  == st1a.val + 2);
    assert(wcschr(st0w.val, L'¦') == st0w.val + 2);
    assert(strchr(st0a.val, (unsigned char)'¦')  == st0a.val + 2);

assert(wcsrchr(st1w.val, L'|') == st1w.val + 2);
    assert(strrchr(st1a.val, (unsigned char)'|')  == st1a.val + 2);
    assert(wcsrchr(st0w.val, L'¦') == st0w.val + 2);
    assert(strrchr(st0a.val, (unsigned char)'¦')  == st0a.val + 2);  // code asserts here
}

int main()
{
    test1();
}

I observe that there's a vectorized implementation of strrchr function is seen in the debugger. It does unaligned part unvectorized, then something fancy vectorized.

The difference between x64 and x86 is explained by that they do not share the implementation. x86 version is implemented in Assembler, not in C or C++.

The alignas(32) and char pad[8] it to make sure the string is not aligned to vectorization natural boundary.

I've created a bug report: https://developercommunity.visualstudio.com/t/strrchr-doesnt-work-as-expected-for-x64/10610842

The problem I think is that strrchr makes sign extension of the source characters to int and compares integers. Using strrchr(st0a.GetString(), (signed char)'¦') should avoid the issue. MFC has (unsigned char) cast inside, so the comparand is zero extended. The difference between '¦' and '|' is that '¦' is above 0x7F, so sign extension and zero extension produce different results.

like image 69
Alex Guteniev Avatar answered Apr 18 '26 11:04

Alex Guteniev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!