I am using python <code>csvkit</code> to compare 2 files like this: <pre class="prettyprint"><code>df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8") df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8") df3 = pd.merge(df1,df2, on='employee_id', how='right') df3.to_csv('output.csv', encoding='utf-8', index=False) </code></pre> Currently I am running the file through a script before hand that strips spaces from the <code>employee_id</code> column. An example of <code>employee_id</code>s: <pre class="prettyprint"><code>37 78973 3 23787 2 22 3 123 </code></pre> Is there a way to get <code>csvkit</code> to do it and save me a step?

You can <code>strip()</code> an entire Series in Pandas using .str.strip(): <pre class="prettyprint"><code>df1['employee_id'] = df1['employee_id'].str.strip() df2['employee_id'] = df2['employee_id'].str.strip() </code></pre> This will remove leading/trailing whitespaces on the <code>employee_id</code> column in both <code>df1</code> and <code>df2</code> Alternatively, you can modify your <code>read_csv</code> lines to also use <code>skipinitialspace=True</code> <pre class="prettyprint"><code>df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8", skipinitialspace=True) df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8", skipinitialspace=True) </code></pre> <hr> It looks like you are attempting to remove spaces in a string containing numbers. You can do this by: <pre class="prettyprint"><code>df1['employee_id'] = df1['employee_id'].str.replace(" ","") df2['employee_id'] = df2['employee_id'].str.replace(" ","") </code></pre>

You can do the <code>strip()</code> in <code>pandas.read_csv()</code> as: <pre class="prettyprint"><code>pandas.read_csv(..., converters={'employee_id': str.strip}) </code></pre> And if you need to only strip leading whitespace: <pre class="prettyprint"><code>pandas.read_csv(..., converters={'employee_id': str.lstrip}) </code></pre> And to remove all spaces: <pre class="prettyprint"><code>def strip_spaces(a_str_with_spaces): return a_str_with_spaces.replace(' ', '') pandas.read_csv(..., converters={'employee_id': strip_spaces}) </code></pre>

<pre class="prettyprint"><code>Df['employee']=Df['employee'].str.strip() </code></pre>

The best and easiest way to remove blank whitespace in pandas dataframes is :- <pre class="prettyprint"><code>df1 = pd.read_csv('input1.csv') df1["employee_id"] = df1["employee_id"].str.strip() </code></pre> That's it

Pandas - Strip white space

Tags:

python

pandas

csv

I am using python csvkit to compare 2 files like this:

df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8")
df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8")
df3 = pd.merge(df1,df2, on='employee_id', how='right')
df3.to_csv('output.csv', encoding='utf-8', index=False)

Currently I am running the file through a script before hand that strips spaces from the employee_id column.

An example of employee_ids:

Is there a way to get csvkit to do it and save me a step?

728

asked Apr 10 '17 20:04

fightstarr20

4 Answers

You can strip() an entire Series in Pandas using .str.strip():

df1['employee_id'] = df1['employee_id'].str.strip()
df2['employee_id'] = df2['employee_id'].str.strip()

This will remove leading/trailing whitespaces on the employee_id column in both df1 and df2

Alternatively, you can modify your read_csv lines to also use skipinitialspace=True

df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8", skipinitialspace=True)
df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8", skipinitialspace=True)

It looks like you are attempting to remove spaces in a string containing numbers. You can do this by:

df1['employee_id'] = df1['employee_id'].str.replace(" ","")
df2['employee_id'] = df2['employee_id'].str.replace(" ","")

116

answered Nov 05 '22 06:11

Andy

You can do the strip() in pandas.read_csv() as:

pandas.read_csv(..., converters={'employee_id': str.strip})

And if you need to only strip leading whitespace:

pandas.read_csv(..., converters={'employee_id': str.lstrip})

And to remove all spaces:

def strip_spaces(a_str_with_spaces):
    return a_str_with_spaces.replace(' ', '')

pandas.read_csv(..., converters={'employee_id': strip_spaces})

answered Nov 05 '22 05:11

Stephen Rauch

Df['employee']=Df['employee'].str.strip()

answered Nov 05 '22 06:11

Vipin

The best and easiest way to remove blank whitespace in pandas dataframes is :-

df1 = pd.read_csv('input1.csv')

df1["employee_id"]  = df1["employee_id"].str.strip()

That's it

answered Nov 05 '22 05:11

Saeed Khan

Related questions
                            
                                Stopping a thread after a certain amount of time
                            
                                Capturing a single image from my webcam in Java or Python
                            
                                Pycrypto install fatal error: gmp.h file not found
                            
                                Saving best model in keras
                            
                                Python - email header decoding UTF-8
                            
                                Why doesn't Python have static variables?
                            
                                how to extract frequency associated with fft values in python
                            
                                How to use socket in Python as a context manager?
                            
                                Insert a Pandas Dataframe into mongodb using PyMongo
                            
                                Readable C# equivalent of Python slice operation
                            
                                What is the most 'pythonic' way to logically combine a list of booleans?
                            
                                apscheduler in Flask executes twice [duplicate]
                            
                                Get the name of a decorated function? [duplicate]
                            
                                How to use avg and sum in SQLAlchemy query
                            
                                How do I generate circular thumbnails with PIL?
                            
                                Using more than one flag in python re.findall
                            
                                Tensorflow Data Adapter Error: ValueError: Failed to find data adapter that can handle input
                            
                                SQLAlchemy boolean value is None
                            
                                pip: Could not find an activated virtualenv (required)
                            
                                KeyError: 'TCL_Library' when I use cx_Freeze

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With