Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove special characters from csv using pandas

Currently cleaning data from a csv file. Successfully mad everything lowercase, removed stopwords and punctuation etc. But need to remove special characters. For example, the csv file contains things such as 'César' '‘disgrace’'. If there is a way to replace these characters then even better but I am fine with removing them. Below is the code I have so far.

import pandas as pd
from nltk.corpus import stopwords
import string
from nltk.stem import WordNetLemmatizer

lemma = WordNetLemmatizer()

pd.read_csv('soccer.csv', encoding='utf-8')
df = pd.read_csv('soccer.csv')

df.columns = ['post_id', 'post_title', 'subreddit']
df['post_title'] = df['post_title'].str.lower().str.replace(r'[^\w\s]+', '').str.split()


stop = stopwords.words('english')

df['post_title'] = df['post_title'].apply(lambda x: [item for item in x if item not in stop])

df['post_title']= df['post_title'].apply(lambda x : [lemma.lemmatize(y) for y in x])


df.to_csv('clean_soccer.csv')
like image 917
plshelpme_ Avatar asked Jan 26 '26 08:01

plshelpme_


2 Answers

When saving the file try:

df.to_csv('clean_soccer.csv', encoding='utf-8-sig')

or simply

df.to_csv('clean_soccer.csv', encoding='utf-8')
like image 90
VnC Avatar answered Jan 27 '26 22:01

VnC


As an alternative to other answers, you could use string.printable:

import string

printable = set(string.printable)

def remove_spec_chars(in_str):
    return ''.join([c for c in in_str if c in printable])

df['post_title'].apply(remove_spec_chars)

For reference, string.printable varies by machine, which is a combination of digits, ascii_letters, punctuation, and whitespace.

For your example string César' '‘disgrace’' this function returns 'Csardisgrace'.

https://docs.python.org/3/library/string.html
How can I remove non-ASCII characters but leave periods and spaces using Python?

like image 23
xibalba1 Avatar answered Jan 27 '26 21:01

xibalba1



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!