I have a folder full of text documents, the text of which needs to be loaded into a single list variable.
Each index of the list, should be the full text of each document.
So far I have this code, but it is not working as well.
dir = os.path.join(current_working_directory, 'FolderName')
file_list = glob.glob(dir + '/*.txt')
corpus = [] #-->my list variable
for file_path in file_list:
text_file = open(file_path, 'r')
corpus.append(text_file.readlines())
text_file.close()
Is there a better way to do this?
Edit: Replaced the csv reading function (read_csv
) with text reading function (readlines()
).
You just need to read()
each file in and append it to your corpus
list as follows:
import glob
import os
file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))
corpus = []
for file_path in file_list:
with open(file_path) as f_input:
corpus.append(f_input.read())
print(corpus)
Each list entry would then be the entire contents of each text file. Note, using readlines()
would give you a list of lines for each file rather than the raw text.
file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))
corpus = [open(file).read() for file in file_list]
This approach though might end up with more resource usage as there is no with
section to automatically close each file.
pathlib
module, which treats paths as objects with methods.Path()
to create a pathlib
object of the path (or use .cwd()
), and use .glob
(or .rglob()
) to find the files matching the specific pattern.
files = (Path().cwd() / 'FolderName').glob('*.txt')
/
is used to add folders (extend) to a pathlib
object.files = Path('./FolderName').glob('*.txt')
files = Path('e:/PythonProjects/stack_overflow/t-files/').glob('*.txt')
Path.read_text()
can be used to read the text into a list
, without using .open()
. The file is opened and then closed.
text = [f.read_text() for f in files]
text = [f.open().read() for f in files]
text = [f.open().readlines() for f in files]
- creates a list
of lists
of text.from pathlib import Path
# get the files
files = (Path().cwd() / 'FolderName').glob('*.txt')
# write the text from each file into a list with a list comprehension - the file is opened and closed
text = [f.read_text() for f in files]
for-loop
Alternativefiles = Path('./FolderName').glob('*.txt')
text = list()
for file in files:
text.append(file.read_text()) # the file is opened and closed
Path.open()
with .read()
can be used to open, and read the file text into a list, and close the file.files = Path('./FolderName').glob('*.txt')
text = list()
for file in files:
with file.open() as f:
text.append(f.read())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With