I have a large datarframe with 1739 rows and 1455 columns. I want to find the 150 lowest values for each row (Not the the 150 th value but 150 values).
I iterate over rows with a basic for loop.
I tried df.min(axis=1) but it only gives out one min. And also the rolling_min function without success. 
Is there any existing function where i can enter the number of values i want to find witn .min?
My ultimate goal is to take the 150 lowest values and create a slope then calculate the area under the curve. Do this for each row and add the areas to obtain a volume.
Example of the the dataframe, I have a df that looks like this:
     -218.7     -218.4    ...          217.2      217.5
0     56.632706  13.638315    ...      76.543000  76.543000
1     56.633455  13.576762    ...      76.543000  76.543000
2    -18.432203 -18.384091    ...      76.543000  76.543000
3    -18.476594 -18.439804    ...      76.543000  76.543000
The header is the '-218.7 ...' which are the coordinates in the x axis of a scan. The data is the height of the scan the y axis. What i need is the 150 lowest values for each rows and there associated column header as i want to make a curve for each row then calculate the area under the curve.
So i need for each line something like this :
         -218.7     -218.4    ... for 150 columns
4    -18.532035 -18.497517    ... for 150 values
I don't think i need to store the header info for each line, a for loop would go trough each row one at a time.
Use .argsort to get the indices of the underlying array sorted. Slice the values and the column Index to get all of the information you need. We'll create a MultiIndex so we can store both the column headers and values in the same DataFrame. The first level will be your nth lowest indicator.
import pandas as pd
import numpy as np
np.random.seed(1)
df = pd.DataFrame(np.random.randint(1,100000, (1739, 26)))
df.columns = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
N = 7  # 150 in your case
idx = np.argsort(df.values, 1)[:, 0:N]
pd.concat([pd.DataFrame(np.take_along_axis(df.to_numpy(), idx, axis=1), index=df.index),
           pd.DataFrame(df.columns.to_numpy(), index=df.index)],
           keys=['Value', 'Columns'], axis=1)
      Value                                           Columns                  
          0      1      2      3      4      5      6       0  1  2  3  4  5  6
0      5193   7752   8445  19947  20610  21441  21759       C  K  U  V  I  G  P
1       432   3607  16278  17138  19434  26104  33879       R  J  W  C  B  D  G
2        16   1047   1845   9553  12314  13784  19432       K  S  E  F  M  O  U
3       244   5272  10836  13682  29237  33230  34448       K  Q  A  S  X  W  G
4      9765  11275  13160  22808  30870  33484  42760       K  T  L  U  C  D  M
5      2034   2179   4980   7184  14826  15238  22807       Z  H  F  Q  L  R  X
...
                        You can use heapq.nsmallest to find the n smallest numbers in a list. This can be quickly applied to each row of a dataframe using .apply:
import pandas as pd
import numpy as np
import heapq
df = pd.DataFrame(np.random.randn(1000, 1000))
# Find the 150 smallest values in each row
smallest = df.apply(lambda x: heapq.nsmallest(150, x), axis=1)
Each row of smallest is now a list of the 150 smallest values in the corresponding row in df. 
This can be converted to a dataframe using:
smallest_df = pd.DataFrame(smallest.values.tolist())
This is now a dataframe where each row corresponds to each row in the original dataframe. There are 150 columns, with the 150 smallest values in each row of the original.
smallest_df.head()

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With