Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Work-around for C# CodeDom causing stack-overflow (CS1647) in csc.exe?

I've got a situation where I need to generate a class with a large string const. Code outside of my control causes my generated CodeDom tree to be emitted to C# source and then later compiled as part of a larger Assembly.

Unfortunately, I've run into a situation whereby if the length of this string exceeds 335440 chars in Win2K8 x64 (926240 in Win2K3 x86), the C# compiler exits with a fatal error:

fatal error CS1647: An expression is too long or complex to compile near 'int'

MSDN says CS1647 is "a stack overflow in the compiler" (no pun intended!). Looking more closely I've determined that the CodeDom "nicely" wraps my string const at 80 chars.This causes the compiler to concatenate over 4193 string chunks which apparently is the stack depth of the C# compiler in x64 NetFx. CSC.exe must internally recursively evaluate this expression to "rehydrate" my single string.

My initial question is this: "does anyone know of a work-around to change how the code generator emits strings?" I cannot control the fact that the external system uses C# source as an intermediate and I want this to be a constant (rather than a runtime concatenation of strings).

Alternatively, how can I formulate this expression such that after a certain number of chars, I am still able to create a constant but it is composed of multiple large chunks?

Full repro is here:

// this string breaks CSC: 335440 is Win2K8 x64 max, 926240 is Win2K3 x86 max
string HugeString = new String('X', 926300);

CodeDomProvider provider = CodeDomProvider.CreateProvider("C#");
CodeCompileUnit code = new CodeCompileUnit();

// namespace Foo {}
CodeNamespace ns = new CodeNamespace("Foo");
code.Namespaces.Add(ns);

// public class Bar {}
CodeTypeDeclaration type = new CodeTypeDeclaration();
type.IsClass = true;
type.Name = "Bar";
type.Attributes = MemberAttributes.Public;
ns.Types.Add(type);

// public const string HugeString = "XXXX...";

CodeMemberField field = new CodeMemberField();
field.Name = "HugeString";
field.Type = new CodeTypeReference(typeof(String));
field.Attributes = MemberAttributes.Public|MemberAttributes.Const;
field.InitExpression = new CodePrimitiveExpression(HugeString);
type.Members.Add(field);

// generate class file
using (TextWriter writer = File.CreateText("FooBar.cs"))
{
    provider.GenerateCodeFromCompileUnit(code, writer, new CodeGeneratorOptions());
}

// compile class file
CompilerResults results = provider.CompileAssemblyFromFile(new CompilerParameters(), "FooBar.cs");

// output reults
foreach (string msg in results.Output)
{
    Console.WriteLine(msg);
}

// output errors
foreach (CompilerError error in results.Errors)
{
    Console.WriteLine(error);
}
like image 975
mckamey Avatar asked Jun 06 '09 19:06

mckamey


2 Answers

Using a CodeSnippetExpression and a manually quoted string, I was able to emit the source that I would have liked to have seen from Microsoft.CSharp.CSharpCodeGenerator.

So to answer the question above, replace this line:

field.InitExpression = new CodePrimitiveExpression(HugeString);

with this:

field.InitExpression = new CodeSnippetExpression(QuoteSnippetStringCStyle(HugeString));

And finally modify the private string quoting Microsoft.CSharp.CSharpCodeGenerator.QuoteSnippetStringCStyle method to not wrap after 80 chars:

private static string QuoteSnippetStringCStyle(string value)
{
    // CS1647: An expression is too long or complex to compile near '...'
    // happens if number of line wraps is too many (335440 is max for x64, 926240 is max for x86)

    // CS1034: Compiler limit exceeded: Line cannot exceed 16777214 characters
    // theoretically every character could be escaped unicode (6 chars), plus quotes, etc.

    const int LineWrapWidth = (16777214/6) - 4;
    StringBuilder b = new StringBuilder(value.Length+5);

    b.Append("\r\n\"");
    for (int i=0; i<value.Length; i++)
    {
        switch (value[i])
        {
            case '\u2028':
            case '\u2029':
            {
                int ch = (int)value[i];
                b.Append(@"\u");
                b.Append(ch.ToString("X4", CultureInfo.InvariantCulture));
                break;
            }
            case '\\':
            {
                b.Append(@"\\");
                break;
            }
            case '\'':
            {
                b.Append(@"\'");
                break;
            }
            case '\t':
            {
                b.Append(@"\t");
                break;
            }
            case '\n':
            {
                b.Append(@"\n");
                break;
            }
            case '\r':
            {
                b.Append(@"\r");
                break;
            }
            case '"':
            {
                b.Append("\\\"");
                break;
            }
            case '\0':
            {
                b.Append(@"\0");
                break;
            }
            default:
            {
                b.Append(value[i]);
                break;
            }
        }

        if ((i > 0) && ((i % LineWrapWidth) == 0))
        {
            if ((Char.IsHighSurrogate(value[i]) && (i < (value.Length - 1))) && Char.IsLowSurrogate(value[i + 1]))
            {
                b.Append(value[++i]);
            }
            b.Append("\"+\r\n");
            b.Append('"');
        }
    }
    b.Append("\"");
    return b.ToString();
}
like image 63
mckamey Avatar answered Nov 05 '22 09:11

mckamey


So am I right in saying you've got the C# source file with something like:

public const HugeString = "xxxxxxxxxxxx...." +
    "yyyyy....." +
    "zzzzz.....";

and you then try to compile it?

If so, I would try to edit the text file (in code, of course) before compiling. That should be relatively straightforward to do, as presumably they'll follow a rigidly-defined pattern (compared with human-generated source code). Convert it to have a single massive line for each constant. Let me know if you'd like some sample code to try this.

By the way, your repro succeeds with no errors on my box - which version of the framework are you using? (My box has the beta of 4.0 on, which may affect things.)

EDIT: How about changing it to not be a string constant? You'd need to break it up yourself, and emit it as a public static readonly field like this:

public static readonly HugeString = "xxxxxxxxxxxxxxxx" + string.Empty +
    "yyyyyyyyyyyyyyyyyyy" + string.Empty +
    "zzzzzzzzzzzzzzzzzzz";

Crucially, string.Empty is a public static readonly field, not a constant. That means the C# compiler will just emit a call to string.Concat which may well be okay. It'll only happen once at execution time of course - slower than doing it at compile-time, but it may be an easier workaround than anything else.

like image 42
Jon Skeet Avatar answered Nov 05 '22 09:11

Jon Skeet