I need to read columns of complex numbers in the format:
# index; (real part, imaginary part); (real part, imaginary part)
1 (1.2, 0.16) (2.8, 1.1)
2 (2.85, 6.9) (5.8, 2.2)
NumPy seems great for reading in columns of data with only a single delimiter, but the parenthesis seem to ruin any attempt at using numpy.loadtxt()
.
Is there a clever way to read in the file with Python, or is it best to just read the file, remove all of the parenthesis, then feed it to NumPy?
This will need to be done for thousands of files so I would like an automated way, but maybe NumPy is not capable of this.
Here's a more direct way than @Jeff's answer, telling loadtxt
to load it in straight to a complex array, using a helper function parse_pair
that maps (1.2,0.16)
to 1.20+0.16j
:
>>> import re
>>> import numpy as np
>>> pair = re.compile(r'\(([^,\)]+),([^,\)]+)\)')
>>> def parse_pair(s):
... return complex(*map(float, pair.match(s).groups()))
>>> s = '''1 (1.2,0.16) (2.8,1.1)
2 (2.85,6.9) (5.8,2.2)'''
>>> from cStringIO import StringIO
>>> f = StringIO(s)
>>> np.loadtxt(f, delimiter=' ', dtype=np.complex,
... converters={1: parse_pair, 2: parse_pair})
array([[ 1.00+0.j , 1.20+0.16j, 2.80+1.1j ],
[ 2.00+0.j , 2.85+6.9j , 5.80+2.2j ]])
Or in pandas:
>>> import pandas as pd
>>> f.seek(0)
>>> pd.read_csv(f, delimiter=' ', index_col=0, names=['a', 'b'],
... converters={1: parse_pair, 2: parse_pair})
a b
1 (1.2+0.16j) (2.8+1.1j)
2 (2.85+6.9j) (5.8+2.2j)
Since this issue is still not resolved in pandas, let me add another solution. You could modify your DataFrame
with a one-liner after reading it in:
import pandas as pd
df = pd.read_csv('data.csv')
df = df.apply(lambda col: col.apply(lambda val: complex(val.strip('()'))))
If your file only has 5 columns like you've shown, you could feed it to pandas with a regex for conversion, replacing the parentheses with commas on every line. After that, you could combine them as suggested in this SO answer to get complex numbers.
Pandas makes it easier, because you can pass a regex to its read_csv
method, which lets you write clearer code and use a converter like this. The advantage over the numpy version is that you can pass a regex for the delimiter.
import pandas as pd
from StringIO import StringIO
f_str = "1 (2, 3) (5, 6)\n2 (3, 4) (4, 8)\n3 (0.2, 0.5) (0.6, 0.1)"
f.seek(0)
def complex_converter(txt):
txt = txt.strip("()").replace(", ", "+").replace("+-", "-") + "j"
return complex(txt)
df = pd.read_csv(buf, delimiter=r" \(|\) \(", converters = {1: complex_converter, 2: complex_converter}, index_col=0)
EDIT: Looks like @Dougal came up with this just before I posted this...really just depends on how you want to handle the complex number. I like being able to avoid the explicit use of the re
module.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With