Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Embeddable language with good string manipulation support

Tags:

c

string

embed

I've been working on a C program which does quite a lot of string manipulation, and very often needs to be tweaked and recompiled for some sort of special case processing. I've been thinking that embedding some scripting language with good string manipulation support might make sense for the project.

What language would provide the best string manipulation support while being easy to embed in a C program?

For some extra background...

  • Performance is pretty important (especially startup time)
  • Needs easily be compiled on multiple platforms (Linux, Solaris, Win32 (ideally with MinGW), Darwin)
  • Needs to be a language which will still be around in 5 years time

I've looked a little at Python (perhaps too heavy weight?) and Lua (perhaps not focused on string manipulation?) but don't really know enough about them or what other choices might be out there.

like image 697
Matt Sheppard Avatar asked Aug 15 '09 03:08

Matt Sheppard


3 Answers

I've never regretted using Lua.

It's very easy to embed in your application. In fact, now I usually don't write C applications, i just write C libraries and control them from Lua.

Text manipulation isn't its best feature, but it's certainly far better than C alone. And the LPEG library makes building parsers almost trivially easy, putting any regex to shame (but still has a couple of regex-like syntaxes if you prefer them).

like image 137
Javier Avatar answered Sep 22 '22 07:09

Javier


Lua stands head and shoulders above other choices.

... best string manipulation support while being easy to embed?

Lua is designed to be embedded in C; the API is clear and easy to use; the documentation is terrific.

Some other responses have denigrated Lua's string capabilities. I think they're underestimating Lua. Lua's string capabilities actually find a sweet spot between "just concatenation" and the full complexity of regular expressions. String formatting capability is very strong, and accumulating strings through "buffers" or tables is simple and efficient.

String scanning is, in my opinion, one of the best parts of the design. It doesn't have "or" patterns but otherwise gives you a large fraction of what you get from regular expressions, including a very powerful and elegant "capture" function. For example, I can convert a string to hex by capturing every single character and applying a function to it:

s:gsub('.', function(c) return string.format("%02x", string.byte(c)) end)

Or I can escape non-alphanumeric, non-space characters into octal:

s:gsub('[^%w%s]', function(c) return string.format([[\%03o]], string.byte(c)) end)

Some of the features on display here:

  • The escape character for string scanning is %, which is different from the escape character for string quoting, which is \. This decision is brilliant and should win an award by itself :-)

  • There are multiple mechanisms for quoting literal strings, including [[...]] in which no characters have to be escaped. If you want to generate or match strings with backslashes in them (like LaTeX for example), this is a godsend.

If you want the full power of a context-free parser, you can always use LPEG, a library written by one of Lua's designers.

Performance is pretty important (especially startup time)

Lua consistently wins performance awards. Startup is lightning fast: the whole system (including compiler, library, garbage collector, and runtime system) fits in 150KB. To avoid pause times, Lua provides incremental garbage collection. See also SO question Why is Lua faster than other scripting languages?

You can make startup even faster by precompiling your scripts, but I've never found it necessary to do this—and because compiled code (as opposed to source code) is not portable, precompilation usually creates more headache than it solves.

Needs easily be compiled on multiple platforms

Lua compiles using pure ANSI C and does not even require POSIX. I have a version running on my PalmOS PDA.

Needs to be a language which will still be around in 5 years time.

Lua has been around since 1993. Moreover, the two members of the team who provide the most support are tenured professors at PUC-Rio. Lua is their livelihood. Finally, the whole system is only 17,000 lines of code. If Rio fell off the map tomorrow, anybody with a good undergraduate compiler course could pick the system up and maintain it. There would be plenty of volunteers.

I've looked a little at Python and Lua but don't really know enough about them

See SO question Which game scripting language is better to use: Lua or Python?.

like image 42
Norman Ramsey Avatar answered Sep 22 '22 07:09

Norman Ramsey


People have been embedding tcl in larger projects for what seems like ages. It's been a while since I've had to use tcl for anything...

One of the things that sets tcl apart from other programming languages is that everything is a string.

And for your reference, here's the tcl documentation on string functions.

tcl might be easier to embed than perl, but I do have to agree @Matthew Scharley's reasoning. Also, tcl isn't exactly known for it's performance, but maybe that's changed in recent years.

Anyway, here is the tcl wiki link on embedding tcl in C applications, and a relevant quote from the page:

"How do I embed a Tcl interpreter in my existing C (or C++) application?" is a very frequently-asked question. It's straightforward, certainly far easier than doing the same with Perl or, in general, Python; moreover, this sort of "embeddability" was one of the original goals for Tcl, and many, many projects do it. There are no complete discussions of the topic available, but we can give an overview here. (RWT 14-Oct-2002)


Another alternative might be to go with Lua, as you mentioned, while extending it with another C string library of your choice (Google turns up The Better String Library, for instance).

Once you've compiled Lua into your application, you can "extend" C functions to Lua's interpreter. Or maybe the built-in string functions are adequate for you.

You certainly have a few options.

like image 41
Mark Rushakoff Avatar answered Sep 21 '22 07:09

Mark Rushakoff