I have the following dataframe:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = np.arange(10)
x = np.concatenate((x,x))
y = []
for i in range(2):
y.append(np.random.random_integers(0,10,20))
d = {'A': [(x[i], y[0][i]) for i in range(20)],
'B': [(x[i], y[1][i]) for i in range(20)]}
df = pd.DataFrame(d, index = list('aaaaaaaaaabbbbbbbbbb'))
df
A B
a (0, 2) (0, 10)
a (1, 0) (1, 8)
a (2, 3) (2, 8)
a (3, 7) (3, 8)
a (4, 8) (4, 10)
a (5, 2) (5, 0)
a (6, 1) (6, 4)
a (7, 3) (7, 9)
a (8, 4) (8, 4)
a (9, 4) (9, 10)
b (0, 0) (0, 3)
b (1, 2) (1, 10)
b (2, 8) (2, 3)
b (3, 1) (3, 7)
b (4, 6) (4, 1)
b (5, 8) (5, 3)
b (6, 1) (6, 4)
b (7, 1) (7, 1)
b (8, 2) (8, 7)
b (9, 9) (9, 3)
How do I make the following plots?
Plot 1 is on column 'A', 2 lines (one line for index = a, the other for index = b), x values are the first elements of the tuples. y values are the 2nd elements of the tuple.
Plot 2 is on column'B', the rest is the same as plot 1.
I cannot figure out how I can extract values from the tuples in the dataframe.
In addition, will groupby be helpful in this case?
In reality, I have about a thousand columns of data, 5 groups, each group ~500 rows. So I'm looking for a quick way to solve this (dataframe size ~2500 x 1000)
Thanks a lot
get_value() function is used to quickly retrieve the single value in the data frame at the passed column and index. The input to the function is the row label and the column label.
DataFrame() function. The Pandas DataFrame object will store the data in a tabular format, Here the tuple element of the list object will become the row of the resultant DataFrame.
For example, you can use dataframe. iloc[0:1, :] to select the first row of a dataframe and all of the columns, or dataframe. iloc[ :, 0:1] to select the first column of a dataframe and all of the rows.
iloc[] to Get a Cell Value by Column Position. If you wanted to get a cell value by column number or index position use DataFrame. iloc[] , index position starts from 0 to length-1 (index starts from zero). In order to refer last column use -1 as the column position.
Here is how to unpack your tuples using zip
. The *
unpacks the argument list of each column.
df['A.x'], df['A.y'] = zip(*df.A)
df['B.x'], df['B.y'] = zip(*df.B)
>>> df.head()
A B A.x A.y B.x B.y
a (0, 6) (0, 0) 0 6 0 0
a (1, 8) (1, 4) 1 8 1 4
a (2, 8) (2, 5) 2 8 2 5
a (3, 5) (3, 2) 3 5 3 2
a (4, 2) (4, 4) 4 2 4 4
I think you can use indexing with str only:
df['a1'], df['a2'] = df['A'].str[0], df['A'].str[1]
df['b1'], df['b2'] = df['B'].str[0], df['B'].str[1]
print (df)
A B a1 a2 b1 b2
a (0, 5) (0, 1) 0 5 0 1
a (1, 0) (1, 5) 1 0 1 5
a (2, 3) (2, 9) 2 3 2 9
a (3, 3) (3, 8) 3 3 3 8
a (4, 7) (4, 9) 4 7 4 9
a (5, 9) (5, 4) 5 9 5 4
a (6, 3) (6, 3) 6 3 6 3
a (7, 5) (7, 0) 7 5 7 0
a (8, 2) (8, 3) 8 2 8 3
a (9, 4) (9, 5) 9 4 9 5
b (0, 7) (0, 0) 0 7 0 0
b (1, 6) (1, 2) 1 6 1 2
b (2, 8) (2, 3) 2 8 2 3
b (3, 8) (3, 8) 3 8 3 8
b (4, 10) (4, 1) 4 10 4 1
b (5, 1) (5, 3) 5 1 5 3
b (6, 6) (6, 3) 6 6 6 3
b (7, 7) (7, 3) 7 7 7 3
b (8, 7) (8, 7) 8 7 8 7
b (9, 8) (9, 0) 9 8 9 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With