There is a dataframe like the following, and it has one unclean column 'id' which it sholud be numeric column <pre class="prettyprint"><code>id, name 1, A 2, B 3, C tt, D 4, E 5, F de, G </code></pre> Is there a concise way to remove the rows because tt and de are not numeric values <pre class="prettyprint"><code>tt,D de,G </code></pre> to make the dataframe clean? <pre class="prettyprint"><code>id, name 1, A 2, B 3, C 4, E 5, F </code></pre>

Using <code>pd.to_numeric</code> <pre class="prettyprint"><code>In [1079]: df[pd.to_numeric(df['id'], errors='coerce').notnull()] Out[1079]: id name 0 1 A 1 2 B 2 3 C 4 4 E 5 5 F </code></pre>

You could use standard method of strings <code>isnumeric</code> and apply it to each value in your <code>id</code> column: <pre class="prettyprint"><code>import pandas as pd from io import StringIO data = """ id,name 1,A 2,B 3,C tt,D 4,E 5,F de,G """ df = pd.read_csv(StringIO(data)) In [55]: df Out[55]: id name 0 1 A 1 2 B 2 3 C 3 tt D 4 4 E 5 5 F 6 de G In [56]: df[df.id.apply(lambda x: x.isnumeric())] Out[56]: id name 0 1 A 1 2 B 2 3 C 4 4 E 5 5 F </code></pre> Or if you want to use <code>id</code> as index you could do: <pre class="prettyprint"><code>In [61]: df[df.id.apply(lambda x: x.isnumeric())].set_index('id') Out[61]: name id 1 A 2 B 3 C 4 E 5 F </code></pre> <h3>Edit. Add timings</h3> Although case with <code>pd.to_numeric</code> is not using <code>apply</code> method it is almost two times slower than with applying <code>np.isnumeric</code> for <code>str</code> columns. Also I add option with using pandas <code>str.isnumeric</code> which is less typing and still faster then using <code>pd.to_numeric</code>. But <code>pd.to_numeric</code> is more general because it could work with any data types (not only strings). <pre class="prettyprint"><code>df_big = pd.concat([df]*10000) In [3]: df_big = pd.concat([df]*10000) In [4]: df_big.shape Out[4]: (70000, 2) In [5]: %timeit df_big[df_big.id.apply(lambda x: x.isnumeric())] 15.3 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In [6]: %timeit df_big[df_big.id.str.isnumeric()] 20.3 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [7]: %timeit df_big[pd.to_numeric(df_big['id'], errors='coerce').notnull()] 29.9 ms ± 682 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) </code></pre>

Remove non-numeric rows in one column with pandas

There is a dataframe like the following, and it has one unclean column 'id' which it sholud be numeric column

id, name 1,  A 2,  B 3,  C tt, D 4,  E 5,  F de, G

Is there a concise way to remove the rows because tt and de are not numeric values

tt,D de,G

to make the dataframe clean?

id, name 1,  A 2,  B 3,  C 4,  E 5,  F

How do I delete rows in pandas DataFrame based on condition?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

How delete all NaN rows in pandas?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .

How do I delete a categorical column in pandas?

The Pandas drop() function in Python is used to drop specified labels from rows and columns. Drop is a major function used in data science & Machine Learning to clean the dataset. Pandas Drop() function removes specified labels from rows or columns.

Using pd.to_numeric

In [1079]: df[pd.to_numeric(df['id'], errors='coerce').notnull()] Out[1079]:   id  name 0  1     A 1  2     B 2  3     C 4  4     E 5  5     F

You could use standard method of strings isnumeric and apply it to each value in your id column:

import pandas as pd from io import StringIO  data = """ id,name 1,A 2,B 3,C tt,D 4,E 5,F de,G """  df = pd.read_csv(StringIO(data))  In [55]: df Out[55]:     id name 0   1    A 1   2    B 2   3    C 3  tt    D 4   4    E 5   5    F 6  de    G  In [56]: df[df.id.apply(lambda x: x.isnumeric())] Out[56]:    id name 0  1    A 1  2    B 2  3    C 4  4    E 5  5    F

Or if you want to use id as index you could do:

In [61]: df[df.id.apply(lambda x: x.isnumeric())].set_index('id') Out[61]:     name id      1     A 2     B 3     C 4     E 5     F

Edit. Add timings

Although case with pd.to_numeric is not using apply method it is almost two times slower than with applying np.isnumeric for str columns. Also I add option with using pandas str.isnumeric which is less typing and still faster then using pd.to_numeric. But pd.to_numeric is more general because it could work with any data types (not only strings).

df_big = pd.concat([df]*10000)  In [3]: df_big = pd.concat([df]*10000)  In [4]: df_big.shape Out[4]: (70000, 2)  In [5]: %timeit df_big[df_big.id.apply(lambda x: x.isnumeric())] 15.3 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  In [6]: %timeit df_big[df_big.id.str.isnumeric()] 20.3 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  In [7]: %timeit df_big[pd.to_numeric(df_big['id'], errors='coerce').notnull()] 29.9 ms ± 682 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Remove non-numeric rows in one column with pandas

Tags:

python

pandas

HungUnicorn

People also ask

2 Answers

Zero

Edit. Add timings

Anton Protopopov

Recent Activity

Donate For Us

Remove non-numeric rows in one column with pandas

Tags:

python

pandas

HungUnicorn

People also ask

2 Answers

Zero

Edit. Add timings

Anton Protopopov

Related questions

Recent Activity

Donate For Us