Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rust slower than Python at parsing files

I was hoping to use Rust to speed up some of the text processing scrips that are currently written in Python.

In order to test the performance of the two languages, I decided to test them on a very simple task:

  • Read in a file from STDIN, line by line.
  • If the line starts with >, save the line to a headers.txt file.
  • Otherwise, save the line to a sequences.txt file.

For this test, I am using a fasta file with 10 million lines, which looks as follows:

$ head uniparc_active-head.fasta
>UPI0000000001 status=active
MGAAASIQTTVNTLSERISSKLEQEANASAQTKCDIEIGNFYIRQNHGCNLTVKNMCSAD
ADAQLDAVLSAATETYSGLTPEQKAYVPAMFTAALNIQTSVNTVVRDFENYVKQTCNSSA
VVDNKLKIQNVIIDECYGAPGSPTNLEFINTGSSKGNCAIKALMQLTTKATTQIAPKQVA
GTGVQFYMIVIGVIILAALFMYYAKRMLFTSTNDKIKLILANKENVHWTTYMDTFFRTSP
MVIATTDMQN
>UPI0000000002 status=active
MMTPENDEEQTSVFSATVYGDKIQGKNKRKRVIGLCIRISMVISLLSMITMSAFLIVRLN
QCMSANEAAITDAAVAVAAASSTHRKVASSTTQYDHKESCNGLYYQGSCYILHSDYQLFS
DAKANCTAESSTLPNKSDVLITWLIDYVEDTWGSDGNPITKTTSDYQDSDVSQEVRKYFC

Here is my Python script:

import fileinput

with open('headers.txt', 'w') as hof, \
        open('sequences.txt', 'w') as sof:
    for line in fileinput.input():
        if line[0] == '>':
            hof.write(line)
        else:
            sof.write(line)

and my Rust script (which I compile in cargo build --release):

use std::io;
use std::fs::File;
use std::io::Write;
use std::io::BufRead;

fn main() {
    let stdin = io::stdin();
    let mut headers = File::create("headers.txt").unwrap();
    let mut sequences = File::create("sequences.txt").unwrap();

    for line in stdin.lock().lines() {
        let line = line.unwrap();
        match &line[..1] {
            ">" => writeln!(headers, "{}", line).unwrap(),
            _ => writeln!(sequences, "{}", line).unwrap(),
        }
    }
}

Running some benchmarks:

Python 2.7

$ time bash -c 'cat uniparc_active-head.fasta | python2 src/main.py'
real    0m11.704s
user    0m6.996s
sys     0m1.100s

Python 3.5

$ time bash -c 'cat uniparc_active-head.fasta | python3 src/main.py'
real    0m16.788s
user    0m12.508s
sys     0m1.576s

PyPy 5.3.1

$ time bash -c 'cat uniparc_active-head.fasta | pypy src/main.py'
real    0m6.526s
user    0m1.536s
sys     0m0.884s

Rust 1.14.0

$ cargo build --release
$ time bash -c 'cat uniparc_active-head.fasta | target/release/parse_text'
real    0m17.493s
user    0m2.728s
sys     0m15.408s

So Rust is ~3x slower than PyPy, and even slower than Python 3.

Can anyone shine some light on this? Did I make a mistake in the Rust code? If not, should I stick to Python / PyPy for processing text files, or is there another language that would be better for the job?

like image 766
ostrokach Avatar asked Dec 30 '16 03:12

ostrokach


1 Answers

As suggested by @BurntSushi5, replacing

let mut headers = File::create("headers.txt").unwrap();
let mut sequences = File::create("sequences.txt").unwrap();

with

let mut headers = io::BufWriter::new(File::create("headers.txt").unwrap());
let mut sequences = io::BufWriter::new(File::create("sequences.txt").unwrap());

Brought the speed up to what I expected:

$ time bash -c 'cat uniparc_active-head.fasta | target/release/parse_text'
real    0m5.645s
user    0m1.396s
sys     0m0.804s
like image 107
ostrokach Avatar answered Nov 15 '22 10:11

ostrokach