Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Perl or C faster at parsing?

Tags:

I have a few very large log files, and I need to parse them. Ease of implementation obviously points me to Perl and regex combo (in which I am a still novice). But what about speed? Will it be faster to implement it in C? Each log file is in the order of 2 GB.

like image 685
Alphaneo Avatar asked Apr 13 '09 07:04

Alphaneo


People also ask

Is Perl faster than C?

Comparing C to Perl, the iterative implementation in C is about 56 times faster than the same algorithm implemented in Perl, and in the case of the recursive algorithm, C is 65 times faster.

Is Perl a fast language?

For a Perl-type problem (scanning and parsing big files), Perl is very fast. Doing a Perl-type problem in a general-purpose language would be considerably slower. However, Python or others will perform much better in the "can I read my own code six months later" benchmark.

Is Perl faster than Java?

For the first pattern, Perl is about 10X faster than Java; for the second, they are about the same. In general, Perl uses a backtrack regex engine. Such an engine is flexible, easy to implement and very fast on a subset of regex.

Which is faster C or CPP?

C++ language is an object-oriented programming language, and it supports some important features like Polymorphism, Abstract Data Types, Encapsulation, etc. Since it supports object-orientation, speed is faster compared to the C language.


1 Answers

I very much doubt C will be faster than Perl unless you were to hand-compile the RE.

By hand-compiling, I mean coding the finite state machine (FSM) directly rather than using the RE engine to compile it. This approach means you can optimize it for your specific case which can often be faster than relying on the more general-purpose engine.

But that's not something I'd ever suggest to anyone who hasn't had to write compilers or parsers before without the benefit of lex, yacc, bison or other similar tools.

The generalized engines, such as PCRE, are usually powerful and fast enough (for my needs anyway, and those needs have often been very demanding).

When using a general RE engine, it needs to be able to handle all sorts of cases whether it's written in C or Perl. When you think about which is faster, you only have to compare what the RE engines are written in for both cases (hint: the Perl RE engine is not written in Perl).

They're both written in C so you should find very little difference in terms of the matching speed.

You may find differences in the support code around the REs but that will be minimal, especially if it's a simple read/match/output loop.

like image 101
paxdiablo Avatar answered Sep 23 '22 16:09

paxdiablo