I have panda dataframe with multiple columns which mixed with values and unwanted characters. <pre class="prettyprint"><code>columnA columnB columnC ColumnD \x00A\X00B NULL \x00C\x00D 123 \x00E\X00F NULL NULL 456 </code></pre> what I'd like to do is to make this dataframe as below. <pre class="prettyprint"><code>columnA columnB columnC ColumnD AB NULL CD 123 EF NULL NULL 456 </code></pre> With my codes below, I can remove '\x00' from columnA but columnC is tricky as it is mixed with NULL in certain row. <pre class="prettyprint"><code>col_names = cols_to_clean fixer = dict.fromkeys([0x00], u'') for i in col_names: if df[i].isnull().any() == False: if df[i].dtype != np.int64: df[i] = df[i].map(lambda x: x.translate(fixer)) </code></pre> Is there any efficient way to remove unwanted characters from columnC?

In general, to remove non-ascii characters, use <code>str.encode</code> with errors='ignore': <pre class="prettyprint"><code>df['col'] = df['col'].str.encode('ascii', 'ignore').str.decode('ascii') </code></pre> To perform this on multiple string columns, use <pre class="prettyprint"><code>u = df.select_dtypes(object) df[u.columns] = u.apply( lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii')) </code></pre> Although that still won't handle the null characters in your columns. For that, you replace them using regex: <pre class="prettyprint"><code>df2 = df.replace(r'\W+', '', regex=True) </code></pre>

Remove non-ASCII characters from string columns in pandas

I have panda dataframe with multiple columns which mixed with values and unwanted characters.

columnA        columnB    columnC        ColumnD
\x00A\X00B     NULL       \x00C\x00D        123
\x00E\X00F     NULL       NULL              456

what I'd like to do is to make this dataframe as below.

columnA  columnB  columnC   ColumnD
AB        NULL       CD        123
EF        NULL       NULL      456

With my codes below, I can remove '\x00' from columnA but columnC is tricky as it is mixed with NULL in certain row.

col_names = cols_to_clean
fixer = dict.fromkeys([0x00], u'')
for i in col_names:
if df[i].isnull().any() == False:
    if df[i].dtype != np.int64:
            df[i] = df[i].map(lambda x: x.translate(fixer))

Is there any efficient way to remove unwanted characters from columnC?

How do I remove non ASCII characters from a string?

Use . replace() method to replace the Non-ASCII characters with the empty string.

How do you remove a non keyboard character in python?

In python, to remove non-ASCII characters in python, we need to use string. encode() with encoding as ASCII and error as ignore, to returns a string without ASCII character use string. decode().

In general, to remove non-ascii characters, use str.encode with errors='ignore':

df['col'] = df['col'].str.encode('ascii', 'ignore').str.decode('ascii')

To perform this on multiple string columns, use

u = df.select_dtypes(object)
df[u.columns] = u.apply(
    lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))

Although that still won't handle the null characters in your columns. For that, you replace them using regex:

df2 = df.replace(r'\W+', '', regex=True)

Remove non-ASCII characters from string columns in pandas

Tags:

python

string

pandas

dataframe

Joohun Lee

People also ask

1 Answers

cs95

Recent Activity

Donate For Us

Remove non-ASCII characters from string columns in pandas

Tags:

python

string

pandas

dataframe

Joohun Lee

People also ask

1 Answers

cs95

Related questions

Recent Activity

Donate For Us