So I got a pandas DataFrame with a single column and a lot of data.
I need to access each of the element, not to change it (with apply()) but to parse it into another function.
When looping through the DataFrame it always stops after the first one.
If I convert it to a list before, then my numbers are all in braces (eg. [12] instead of 12) thus breaking my code.
Does anyone see what I am doing wrong?
import pandas as pd
def go_trough_list(df):
for number in df:
print(number)
df = pd.read_csv("my_ids.csv")
go_trough_list(df)
df looks like:
1
0 2
1 3
2 4
dtype: object
[Finished in 1.1s]
Edit: I found one mistake. My first value is recognized as a header. So I changed my code to:
df = pd.read_csv("my_ids.csv",header=None)
But with
for ix in df.index:
print(df.loc[ix])
I get:
0 1
Name: 0, dtype: int64
0 2
Name: 1, dtype: int64
0 3
Name: 2, dtype: int64
0 4
Name: 3, dtype: int64
edit: Here is my Solution thanks to jezrael and Nick!
First I added headings=None
because my data has no header.
Then I changed my function to:
def go_through_list(df)
new_list = df[0].apply(my_function,parameter=par1)
return new_list
And it works perfectly! Thank you again guys, problem solved.
In order to iterate over rows, we apply a function itertuples() this function return a tuple for each row in the DataFrame. The first element of the tuple will be the row's corresponding index value, while the remaining values are the row values.
Iterate Over DataFrame Columns One simple way to iterate over columns of pandas DataFrame is by using for loop. You can use column-labels to run the for loop over the pandas DataFrame using the get item syntax ([]) . Yields below output. The values() function is used to extract the object elements as a list.
Using DataFrame. pandas DataFrame. iterrows() is used to iterate over DataFrame rows. This returns (index, Series) where the index is an index of the Row and Series is data or content of each row. To get the data from the series, you should use the column name like row["Fee"] .
Pandas DataFrame iterrows() Method The iterrows() method generates an iterator object of the DataFrame, allowing us to iterate each row in the DataFrame. Each iteration produces an index object and a row object (a Pandas Series object).
You can use the index as in other answers, and also iterate through the df and access the row like this:
for index, row in df.iterrows():
print(row['column'])
however, I suggest solving the problem differently if performance is of any concern. Also, if there is only one column, it is more correct to use a Pandas Series.
What do you mean by parse it into another function? Perhaps take the value, and do something to it and create it into another column?
I need to access each of the element, not to change it (with apply()) but to parse it into another function.
Perhaps this example will help:
import pandas as pd
df = pd.DataFrame([20, 21, 12])
def square(x):
return x**2
df['new_col'] = df[0].apply(square) # can use a lambda here nicely
You can convert column as Series
tolist
:
for x in df['Colname'].tolist():
print x
Sample:
import pandas as pd
df = pd.DataFrame({'a': pd.Series( [1, 2, 3]),
'b': pd.Series( [4, 5, 6])})
print df
a b
0 1 4
1 2 5
2 3 6
for x in df['a'].tolist():
print x
1
2
3
If you have only one column, use iloc
for selecting first column:
for x in df.iloc[:,0].tolist():
print x
Sample:
import pandas as pd
df = pd.DataFrame({1: pd.Series( [2, 3, 4])})
print df
1
0 2
1 3
2 4
for x in df.iloc[:,0].tolist():
print x
2
3
4
This can work too, but it is not recommended approach, because 1
can be number or string and it can raise Key error:
for x in df[1].tolist():
print x
2
3
4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With