Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open/read command in Tcl 8.5 for large files

Tags:

tcl

Sorry if the title doesn't match my question well, I'm still unsure as to how I should put it.

Anyway, I've been using Tcl/Tk on Windows (wish) for a while now and haven't encountered any problem on the script I wrote until recently. The script is supposed to break down a large txt file into smaller files that can be imported to excel (I'm talking about breaking down a file with maybe 25M lines which comes around 2.55 GB).

My current script is something like that:

set data [open "file.txt" r]
set data1 [open "File Part1.txt" w]
set data2 [open "File Part2.txt" w]
set data3 [open "File Part3.txt" w]
set data4 [open "File Part4.txt" w]
set data5 [open "File Part5.txt" w]


set count 0
while {[gets $data line] != -1} {
    if {$count > 4000000} {
        puts $data5 $line
    } elseif {$count > 3000000} {
        puts $data4 $line
    } elseif {$count > 2000000} {
        puts $data3 $line
    } elseif {$count > 1000000} {
        puts $data2 $line
    } else {
        puts $data1 $line
    }
    incr count
}

close $data
close $data1
close $data2
close $data3
close $data4
close $data5

And I alter the numbers within the if to get the desired number of lines per file, or add/remove any elseif where required.

The problem is, with the latest file I got, I end up with only about half the data (1.22 GB instead of 2.55 GB) and I was wondering if there was a line which told Tcl to ignore the limit that it can read. I tried to look for it, but I didn't find anything (or anything that I could understand well; I'm still quite the amateur at Tcl ^^;). Can anyone help me?

EDIT (update): I found a program to open large text files and managed to get a preview of the contents of the file directly. There are actually 16,756,263 lines. I changed the script to:

set data [open "file.txt" r]
set data1 [open "File Part1.txt" w]

set count 0
while {[gets $data line] != -1} {
    incr count
}
puts $data1 $count
close $data
close $data1

to get where the script is blocking and it stopped here: enter image description here

There's a character that the text editor is not recognising in the middle line showing as a little square. I tried to use fconfigure like evil otto suggested but I'm afraid I don't quite understand how the channelID, name or value work exactly to escape that character. Um... help?

reEDIT : I managed to find out how fconfigure worked! Thanks evil otto! Um, I'm not sure how I can 'choose' your answer since it's a comment instead of a proper answer...

like image 335
Jerry Avatar asked Dec 18 '12 12:12

Jerry


People also ask

How do I read a file from a directory in TCL?

The Tcl file commands are file, open, close, gets and read, and puts, seek, tell, and eof, fblocked, fconfigure, Tcl_StandardChannels(3), flush, fileevent, filename. One way to get file data in Tcl is to 'slurp' up the file into a text variable. This works really well if the files are known to be small.

What is the default access mode of TCL file?

Open the file for reading only; the file must already exist. This is the default value if access is not specified. r+ Open the file for both reading and writing; the file must already exist.


1 Answers

Is it possible there is any binary data in "file.txt"? Under windows, tcl will flag eof if it reads a ^Z (the default eofchar) in a file. You can turn this off with fconfigure:

fconfigure $data -eofchar {}

See the docs for full details.

like image 143
evil otto Avatar answered Oct 11 '22 13:10

evil otto