Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python/pandas:need help adding double quotes to columns

I need to add double quotes to specific columns in a csv file that my script generates.

Below is the goofy way I thought of doing this. For these two fixed-width fields, it works:

df['DATE'] = df['DATE'].str.ljust(9,'"')
df['DATE'] = df['DATE'].str.rjust(10,'"')

df['DEPT CODE'] = df['DEPT CODE'].str.ljust(15,'"')
df[DEPT CODE'] = df['DEPT CODE'].str.rjust(16,'"')

For the following field, it doesn't. It has a variable length. So, if the value is shorter than the standard 6-digits, I get extra double-quotes: "5673"""

df['ID'] = df['ID'].str.ljust(7,'"')
df['ID'] = df['ID'].str.rjust(8,'"')

I have tried zfill, but the data in the column is a series-- I get "pandas.core.series.Series" when i run

print type(df['ID'])

and I have not been able to convert it to string using astype. I'm not sure why. I have not imported numpy.

I tried using len() to get the length of the ID number and pass it to str.ljust and str.rjust as its first argument, but I think it got hung up on the data not being a string.

Is there a simpler way to apply double-quotes as I need, or is the zfill going to be the way to go?

like image 675
mattrweaver Avatar asked Jan 08 '23 05:01

mattrweaver


2 Answers

You can add a speech mark before / after:

In [11]: df = pd.DataFrame([["a"]], columns=["A"])

In [12]: df
Out[12]:
   A
0  a

In [13]: '"' + df['A'] + '"'
Out[13]:
0    "a"
Name: A, dtype: object

Assigning this back:

In [14]: df['A'] = '"' + df.A + '"'

In [15]: df
Out[15]:
     A
0  "a"

If it's for exporting to csv you can use the quoting kwarg:

In [21]: df = pd.DataFrame([["a"]], columns=["A"])

In [22]: df.to_csv()
Out[22]: ',A\n0,a\n'

In [23]: df.to_csv(quoting=1)
Out[23]: '"","A"\n"0","a"\n'
like image 172
Andy Hayden Avatar answered Jan 09 '23 20:01

Andy Hayden


With numpy, not pandas, you can specify the formatting method when saving to a csv file. As very simple example:

In [209]: np.savetxt('test.txt',['string'],fmt='%r')
In [210]: cat test.txt
'string'

In [211]: np.savetxt('test.txt',['string'],fmt='"%s"')
In [212]: cat test.txt
"string"

I would expect the pandas csv writer to have a similar degree of control, if not more.

like image 31
hpaulj Avatar answered Jan 09 '23 18:01

hpaulj