I have a similar problem to the one posted here:
Pandas DataFrame: remove unwanted parts from strings in a column
I need to remove newline characters from within a string in a DataFrame. Basically, I've accessed an api using python's json module and that's all ok. Creating the DataFrame works amazingly, too. However, when I want to finally output the end result into a csv, I get a bit stuck, because there are newlines that are creating false 'new rows' in the csv file.
So basically I'm trying to turn this:
'...this is a paragraph.
And this is another paragraph...'
into this:
'...this is a paragraph. And this is another paragraph...'
I don't care about preserving any kind of '\n' or any special symbols for the paragraph break. So it can be stripped right out.
I've tried a few variations:
misc['product_desc'] = misc['product_desc'].strip('\n') AttributeError: 'Series' object has no attribute 'strip'
here's another
misc['product_desc'] = misc['product_desc'].str.strip('\n') TypeError: wrapper() takes exactly 1 argument (2 given) misc['product_desc'] = misc['product_desc'].map(lambda x: x.strip('\n')) misc['product_desc'] = misc['product_desc'].map(lambda x: x.strip('\n\t'))
There is no error message, but the newline characters don't go away, either. Same thing with this:
misc = misc.replace('\n', '')
The write to csv line is this:
misc_id.to_csv('C:\Users\jlalonde\Desktop\misc_w_id.csv', sep=' ', na_rep='', index=False, encoding='utf-8')
Version of Pandas is 0.9.1
Thanks! :)
Python String | replace() replace() is an inbuilt function in the Python programming language that returns a copy of the string where all occurrences of a substring are replaced with another substring. Parameters : old – old substring you want to replace. new – new substring which would replace the old substring.
You can replace a string in the pandas DataFrame column by using replace(), str. replace() with lambda functions.
To remove characters from columns in Pandas DataFrame, use the replace(~) method. Here, [ab] is regex and matches any character that is a or b .
strip
only removes the specified characters at the beginning and end of the string. If you want to remove all \n
, you need to use replace
.
misc['product_desc'] = misc['product_desc'].str.replace('\n', '')
You could use regex
parameter of replace
method to achieve that:
misc['product_desc'] = misc['product_desc'].replace(to_replace='\n', value='', regex=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With