Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing space in dataframe python

I am getting an error in my code because I tried to make a dataframe by calling an element from a csv. I have two columns I call from a file: CompanyName and QualityIssue. There are three types of Quality issues: Equipment Quality, User, and Neither. I run into problems trying to make a dataframe df.Equipment Quality, which obviously doesn't work because there is a space there. I want to take Equipment Quality from the original file and replace the space with an underscore.

input:

Top Calling Customers,         Equipment Quality,    User,    Neither,
Customer 3,                      2,           2,        0,
Customer 1,                      0,           2,        1,
Customer 2,                      0,           1,        0,
Customer 4,                      0,           1,        0,

Here is my code:

import numpy as np
import pandas as pd
import pandas.util.testing as tm; tm.N = 3

# Get the data.
data = pd.DataFrame.from_csv('MYDATA.csv')   
# Group the data by calling CompanyName and QualityIssue columns.
byqualityissue = data.groupby(["CompanyName", "QualityIssue"]).size() 
# Make a pandas dataframe of the grouped data.
df = pd.DataFrame(byqualityissue) 
# Change the formatting of the data to match what I want SpiderPlot to read.
formatted = df.unstack(level=-1)[0]  
# Replace NaN values with zero.
formatted[np.isnan(formatted)] = 0 
includingtotals = pd.concat([formatted,pd.DataFrame(formatted.sum(axis=1), 
                             columns=['Total'])], axis=1)
sortedtotal = includingtotals.sort_index(by=['Total'], ascending=[False])
sortedtotal.to_csv('byqualityissue.csv')

This seems to be a frequently asked question and I tried lots of the solutions but they didn't seem to work. Here is what I tried:

with open('byqualityissue.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    return [[x.strip() for x in row] for row in reader]
    sentence.replace(" ", "_")

And

sortedtotal['QualityIssue'] = sortedtotal['QualityIssue'].map(lambda x: x.rstrip(' ')) 

And what I thought was the most promising from here http://pandas.pydata.org/pandas-docs/stable/text.html:

formatted.columns = formatted.columns.str.strip().str.replace(' ', '_')

but I got this error: AttributeError: 'Index' object has no attribute 'str'

Thanks for your help in advance!

like image 846
jenryb Avatar asked Jun 10 '15 17:06

jenryb


People also ask

How do I remove spaces from a DataFrame in Python?

str. strip()” to remove the whitespace from the string. Using strip function we can easily remove extra whitespace from leading and trailing whitespace from staring. It returns a series or index of an object.

How do you get rid of spaces in Python?

Python String strip() function will remove leading and trailing whitespaces. If you want to remove only leading or trailing spaces, use lstrip() or rstrip() function instead.

How do I remove spaces from pandas DataFrame column names?

To strip whitespaces from column names, you can use str. strip, str. lstrip and str. rstrip.

How do I remove spaces from a column name?

Removing spaces from column names in pandas is not very hard we easily remove spaces from column names in pandas using replace() function. We can also replace space with another character.


1 Answers

Try:

formatted.columns = [x.strip().replace(' ', '_') for x in formatted.columns]
like image 69
Alexander Avatar answered Nov 15 '22 20:11

Alexander