Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load multiple text files from a folder into a python list variable

Tags:

python

I have a folder full of text documents, the text of which needs to be loaded into a single list variable.

Each index of the list, should be the full text of each document.

So far I have this code, but it is not working as well.

dir = os.path.join(current_working_directory, 'FolderName')
file_list = glob.glob(dir + '/*.txt')
corpus = [] #-->my list variable
for file_path in file_list:
    text_file = open(file_path, 'r')
    corpus.append(text_file.readlines()) 
    text_file.close()

Is there a better way to do this?

Edit: Replaced the csv reading function (read_csv) with text reading function (readlines()).

like image 882
Minu Avatar asked Feb 06 '23 01:02

Minu


2 Answers

You just need to read() each file in and append it to your corpus list as follows:

import glob
import os

file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))

corpus = []

for file_path in file_list:
    with open(file_path) as f_input:
        corpus.append(f_input.read())

print(corpus)

Each list entry would then be the entire contents of each text file. Note, using readlines() would give you a list of lines for each file rather than the raw text.

With a list-comprehension

file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))

corpus = [open(file).read() for file in file_list]

This approach though might end up with more resource usage as there is no with section to automatically close each file.

like image 162
Martin Evans Avatar answered Feb 08 '23 15:02

Martin Evans


  • Solve this with the pathlib module, which treats paths as objects with methods.
  • Use Path() to create a pathlib object of the path (or use .cwd()), and use .glob (or .rglob()) to find the files matching the specific pattern.
    • files = (Path().cwd() / 'FolderName').glob('*.txt')
      • / is used to add folders (extend) to a pathlib object.
    • Alternatives:
      • files = Path('./FolderName').glob('*.txt')
      • files = Path('e:/PythonProjects/stack_overflow/t-files/').glob('*.txt')
  • Path.read_text() can be used to read the text into a list, without using .open(). The file is opened and then closed.
    • text = [f.read_text() for f in files]
    • Alternatives:
      • text = [f.open().read() for f in files]
      • text = [f.open().readlines() for f in files] - creates a list of lists of text.
from pathlib import Path

# get the files
files = (Path().cwd() / 'FolderName').glob('*.txt')

# write the text from each file into a list with a list comprehension - the file is opened and closed
text = [f.read_text() for f in files]

for-loop Alternative

Option 1

files = Path('./FolderName').glob('*.txt')

text = list()

for file in files:
    text.append(file.read_text())  # the file is opened and closed

Option 2

  • Path.open() with .read() can be used to open, and read the file text into a list, and close the file.
files = Path('./FolderName').glob('*.txt')

text = list()

for file in files:
    with file.open() as f:
        text.append(f.read())
  • Also see SO: How to open every file in a folder
like image 37
Trenton McKinney Avatar answered Feb 08 '23 16:02

Trenton McKinney