i have a table in my pandas dataframe. df
id count price
1 2 100
2 7 25
3 3 720
4 7 221
5 8 212
6 2 200
i want to create a new dataframe(df2) from this, selecting rows where count is 2 and price is 100,and count is 7 and price is 221
my output should be df2 =
id count price
1 2 100
4 7 221
i am trying using df[df['count'] == '2' & df['price'] == '100']
but getting error
TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]
You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.
To create a PySpark DataFrame from an existing RDD, we will first create an RDD using the . parallelize() method and then convert it into a PySpark DataFrame using the . createDatFrame() method of SparkSession.
You nedd add ()
because &
has higher precedence than ==
:
df3 = df[(df['count'] == '2') & (df['price'] == '100')]
print (df3)
id count price
0 1 2 100
If need check multiple values use isin
:
df4 = df[(df['count'].isin(['2','7'])) & (df['price'].isin(['100', '221']))]
print (df4)
id count price
0 1 2 100
3 4 7 221
But if check numeric, use:
df3 = df[(df['count'] == 2) & (df['price'] == 100)]
print (df3)
df4 = df[(df['count'].isin([2,7])) & (df['price'].isin([100, 221]))]
print (df4)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With