I’m trying to concatenate reviews from multiple paragraphs into one— i’m trying it like this:
for x in docs:
with open(fp) as data_file:
data_item = json.load(data_file)
b = data_item['reviews']
for item in b:
name = '000' + str(counter) + '.txt'
file = open(name, 'wb')
output = item['text']
" ".join(output.split())
counter = counter+1
file.write(output.encode('utf-8'))
file.close()
It’s not working however; each .txt output file is as it is (with \n \n) in the JSON field...
Example JSON:
{ "reviews": [ { "created": "2008-07-09T00:00:00", "text": "There's something reassuring etc. \n\nThe band's skill etc. \n\nCraig Finn's vocals etc.\n", }, "votes_negative": 0, "votes_positive": 0 } ] }
Resultant output (.txt):
There's something reassuring etc.
The band's skill etc.
Craig Finn's vocals etc.
Many thanks in advance.
You don't assign the output of join to a variable, try this:
# sidenote: use enumerate to replace counter
for counter, item in enumerate(b):
name = '000' + str(counter) + '.txt'
output = item['text']
output = ' '.join(output.split())
# imho with is always nicer than open/close
with open(name, ‘wb’) as file:
file.write(output.encode(‘utf-8’))
If i'm reading your question correctly, you want everything all on one line, which you could do with this:
...
output = item['text'].replace('\n',' ')
...
Output:
There's something reassuring etc. The band's skill etc. Craig Finn's vocals etc.
or if you want one line between each:
...
output = item['text'].replace('\n\n','\n')
...
Output:
There's something reassuring etc.
The band's skill etc.
Craig Finn's vocals etc.
# One extra blank line here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With