I have a very simple file:
[Name]
Streamline 1
[Data]
X [ m ], Y [ m ], Z [ m ], Velocity [ m s^-1 ]
2.66747564e-01, 0.00000000e+00, 2.03140453e-01, (0.00000000e+00, 8.17744827e+00, 0.00000000e+00)
2.66958952e-01, 0.00000000e+00, 2.07407191e-01, (0.00000000e+00, 6.77392197e+00, 0.00000000e+00)
2.63460875e-01, 0.00000000e+00, 2.06593186e-01, (0.00000000e+00, 7.04168701e+00, 0.00000000e+00)
2.65424699e-01, 0.00000000e+00, 2.00831652e-01, (0.00000000e+00, 8.93691921e+00, 0.00000000e+00)
2.70607203e-01, 0.00000000e+00, 2.02286631e-01, (0.00000000e+00, 8.45830917e+00, 0.00000000e+00)
2.68299729e-01, 0.00000000e+00, 1.97365344e-01, (0.00000000e+00, 1.00771456e+01, 0.00000000e+00)
...
I need to load the velocity as a vector, into a single row.
My basic code:
df = pd.read_csv("C:/Users/Marek/Downloads/0deg-5ms.csv", skiprows=5)
But this attempt leads to 1st 2 cols becoming index and the rest splits into 4 columns. index_col=False
can solve the issue with index, but leads to index out of range. I need a delimiter that implicitly tells pandas to ignore whatever is in brackets. I thought python ignore the separator withing brackets while reading a csv file might work but yes, I have spaces everywhere. I found some solutions that use extended functions to load files and handle them by lines, such as CSV file containing column with occasional comma in parentheses crashes pandas.read_csv and Load CSV with data surrounded by parentheses into a pandas dataframe . I however believe that this is a very easy scenario, as all lines are similar and can be solved by one-liner adding delimiter='some_regex'
. I however cannot figure out, how this regex should look. It should look for delimiter ,
but not (.*,.*)
.
I have tried with following, but this results in a single column:
df = pd.read_csv("C:/Users/Marek/Downloads/0deg-5ms.csv", skiprows=5, delimiter=',^(\(.*,.*\))')
EDIT: got to something like this - ,|(?:(\(.*,.*\)))
, but this adds an empty column after each comma.
index_col: This is to allow you to set which columns to be used as the index of the dataframe. The default value is None, and pandas will add a new column start from 0 to specify the index column. It can be set as a column name or column index, which will be used as the index column.
A comma-separated values (csv) file is returned as two-dimensional data structure with labeled axes. Write DataFrame to a comma-separated values (csv) file. Read a comma-separated values (csv) file into DataFrame.
Idea #2: Use CSVs rather than Excel FilesImporting csv files in Python is 100x faster than Excel files. We can now load these files in 0.63 seconds. That's nearly 10 times faster!
You can parse manually the file:
data = []
with open('data.csv') as fp:
[next(fp) for i in range(5)] # skiprows=5
headers = [c.strip() for c in next(fp).split(',')]
for line in fp:
data.append([i.strip() for i in re.split(r',(?![^\(]*[\)])', line)])
df = pd.DataFrame(data, columns=headers).apply(pd.eval)
Output:
>>> df
X [ m ] Y [ m ] Z [ m ] Velocity [ m s^-1 ]
0 0.266748 0.0 0.203140 [0.0, 8.17744827, 0.0]
1 0.266959 0.0 0.207407 [0.0, 6.77392197, 0.0]
2 0.263461 0.0 0.206593 [0.0, 7.04168701, 0.0]
3 0.265425 0.0 0.200832 [0.0, 8.93691921, 0.0]
4 0.270607 0.0 0.202287 [0.0, 8.45830917, 0.0]
5 0.268300 0.0 0.197365 [0.0, 10.0771456, 0.0]
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 X [ m ] 6 non-null float64
1 Y [ m ] 6 non-null float64
2 Z [ m ] 6 non-null float64
3 Velocity [ m s^-1 ] 6 non-null object
dtypes: float64(3), object(1)
memory usage: 320.0+ bytes
>>> type(df.iloc[0, 3]) # [0.0, 8.17744827, 0.0]
list
>>> type(df.iloc[0, 3][1]) # 8.17744827
float
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With