Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read large file line-by-line with ADO Stream?

I want to use ADO Stream to read lines from a local large text file with UTF-8 encoding so I try

Set objStream = CreateObject("ADODB.Stream")
objStream.Charset = "utf-8"
objStream.Type = 2
objStream.Open
objStream.LoadFromFile = strFile
objStream.LineSeparator = 10
Do Until objStream.EOS
    strLine = objStream.ReadText(-2)
Loop

However the result is that the script takes lots of RAM and CPU usages. So is there any way to tell the script not to load all the file contents into memory, but just open it and read until it encounters any line separator?

like image 814
Teiv Avatar asked Mar 01 '13 05:03

Teiv


2 Answers

As you work with Stream object, I think it's obvious, however, .LoadFromFile fill current stream with the whole file content, and no any cutomize option to load parial data from file.

As for reading 1 line, you done this already with .ReadText(-2), (-2 = adReadLine).

Set objStream = CreateObject("ADODB.Stream")
objStream.Charset = "utf-8"
objStream.Type = 2
objStream.Open
'objStream.LoadFromFile = strFile ''I see a typo here
objStream.LoadFromFile strFile
objStream.LineSeparator = 10      ''that's Ok
'Do Until objStream.EOS           ''no need this
    strLine = objStream.ReadText(-2)
'Loop
objStream.Close ''add this though!

[EDIT] Well, for .LineSeparator you can use just 3 constants:

Constant Value Description

adCRLF   -1    Default. Carriage return line feed 
adLF     10    Line feed only 
adCR     13    Carriage return only 

If you need to break your Do..Loop at other letter, as .ReadText is the only choice for reading text stream, you may use it in conjunction with InStr function and Exit Do then you find your custom separator.

Const cSeparator = "_" 'your custom separator
Dim strLine, strTotal, index
Do Until objStream.EOS
    strLine = objStream.ReadText(-2)
    index = InStr(1, strLine, cSeparator)
    If index <> 0 Then
        strTotal = strTotal & Left(strLine, index-1)
        Exit Do
    Else
        strTotal = strTotal & strLine
    End If
Loop

Shortly, this is the whole optimization you can do (or at least as far as I know).

like image 129
Panayot Karabakalov Avatar answered Nov 07 '22 07:11

Panayot Karabakalov


If you look at this snippet from J. T. Roff's ADO book, you'll see that in theory you can read from a file line by line (without loading it completely into memory). I tried using the file: protocoll in the source parameter, but did not succeed.

So let's try another approach: To treat the .txt file as a UTF8 encoded trivial (one column) ADO database table, you need a schema.ini file in the source directory:

[linesutf8.txt]
ColNameHeader=False
CharacterSet=65001
Format=TabDelimited
Col1=SampleText CHAR WIDTH 100

Then you can do:

  Dim sTDir  : sTDir   = "M:/lib/kurs0705/testdata"
  Dim sFName : sFName  = "[linesutf8.txt]"
  Dim oDb    : Set oDb = CreateObject("ADODB.Connection")
  Dim sCs    : sCs     = Join(Array( _
          "Provider=MSDASQL" _
        , "Driver={Microsoft Text Driver (*.txt; *.csv)}" _
        , "DBQ=" + sTDir _
  ), ";")
  oDb.open sCs
  WScript.Stdin.Readline
  Dim oRs    : Set oRs = oDb.Execute("SELECT * FROM " & sFName)
  WScript.Stdin.Readline
  Do Until oRS.EOF
     WScript.Echo oRS.Fields(0).Value
     oRs.MoveNext
  Loop
  oRs.Close
  oDb.Close

For some background look here.

like image 43
Ekkehard.Horner Avatar answered Nov 07 '22 05:11

Ekkehard.Horner