Working to separate a single long text file into multiple files. Each section that needs to be placed into its own file, is separated by hyphen lines that look something like:
This is section of some sample text
that says something.
2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This says something else
3---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Maybe this says something eles
4---------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
I have started the attempt in python without much success. I considered using the split fnx but I'm finding most examples provided for the split fnx revolve around len rather than regex type characters. This only generates one large file.
with open ('someName.txt','r') as fo:
start=1
cntr=0
for x in fo.read().split("\n"):
if x=='---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------':
start = 1
cntr += 1
continue
with open (str(cntr)+'.txt','a+') as opf:
if not start:
x = '\n'+x
opf.write(x)
start = 0
You might get better results from switching the conditional from == to in. That way if the line you are testing has any leading characters it will still pass the condition. For example below I changed the x=='-----...' to '-----' in x. the change is at the very end of the long string of hyphens.
with open ('someName.txt','r') as fo:
start=1
cntr=0
for x in fo.read().split("\n"):
if ('-----------------------------------------------------'
'-----------------------------------------------------'
'-----------------------------------------------------'
'------------------------------------------------') in x:
start = 1
cntr += 1
continue
with open (str(cntr)+'.txt','a+') as opf:
if not start:
x = '\n'+x
opf.write(x)
start = 0
An alternative solution would be to use regular expressions. For example...
import re
with open('someName.txt', 'rt') as fo:
counter = 0
pattern = re.compile(r'--+') # this is the regex pattern
for group in re.split(pattern, fo.read()):
# the re.split function used in the loop splits text by the pattern
with open(str(counter)+'.txt','a+') as opf:
opf.write(group)
counter += 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With