Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write Pandas DataFrame to file using FORTRAN format string

I would like to write a pandas dataframe to a file using a FORTRAN format string. I haven't been able to find anything online except a discussion of how this functionality would be nice. Does anyone know if this is possible?

I suppose I don't need to use a fortran format string...I just need to get the output file in a specific format that fortran can easily read.

UPDATE: For example, I have a large data file that has a specified fortran format. I load the file into my python function, manipulate the data, and then would like to export the manipulated data into a file with the same format it had originally. An example of the file format would be something like:

FORMAT (1X,F12.6,2F9.6,F11.7,T61,2F9.6,F10.7,T142,I6,1X,A2,T236,A1)

The reason I need to export the data in a specific format is because the output file will be read directly into a well-established fortran code (meaning the fortran code cannot be altered).

like image 694
WillaB Avatar asked Oct 20 '25 01:10

WillaB


1 Answers

Here's a nice tidy solution that uses the fortranformat package (pip install fotranformat, https://pypi.org/project/fortranformat/) and df.apply() that let's you use a standard fortran format string:

import fortranformat as ff
import pandas as pd 

df = pd.DataFrame({
        'sampleId': ['A','B','C','D'],        
        'var1' : [0.002,0.004,0.006,0.002],
        'var2' : [1.2,1.4,1.6,1.2],
        'Nobs': [32,12,9,30]
    })

format_string = '(a5, f8.3, f8.1, i5)'
header_line = ff.FortranRecordWriter(format_string)
Formatted_df = df.apply(lambda x : header_line.write(x.values),axis=1)

The Formatted_df object will be a Series with a string element for each row of the dataframe:

>>> print(Formatted_df)
0        A   0.002     1.2   32
1        B   0.004     1.4   12
2        C   0.006     1.6    9
3        D   0.002     1.2   30
dtype: object
>>> print(Formatted_df.loc[0])
    A   0.002     1.2   32
>>> print(type(Formatted_df.loc[0]))
<class 'str'>

To write it to file you can then just use to_csv:

Formatted_df.to_csv('formatted_df.csv',index=False,header=False)

Note that this won't include any column names, so you may wish to initialize the output file then append to it:

output_fi='formatted_df.csv'
col_names=df.columns.tolist()
with open(output_fi,'w') as outfi: 
    outfi.write('# '+' '.join(col_names)+"\n")
    outfi.write('# '+format_string+"\n")
    
Formatted_df.to_csv(output_fi,mode='a',index=False,header=False)

Also note that this assumes you know the ordering of your dataframe columns already.

ALSO note that you may run into memory issues if you're dealing with very large dataframes as Formatted_df will be a complete copy of df. If that's the case, you'll have to chunk it up!

like image 115
chris Avatar answered Oct 21 '25 15:10

chris



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!