Select non-null rows from a specific column in a DataFrame and take a sub-selection of other columns

People also ask

How do I get not null values from a DataFrame?

The notnull() method returns a DataFrame object where all the values are replaced with a Boolean value True for NOT NULL values, and otherwise False.

How do I select sub columns in pandas?

You can perform the same task using the dot operator. To select multiple columns, you can pass a list of column names to the indexing operator. Alternatively, you can assign all your columns to a list variable and pass that variable to the indexing operator.

You can pass a boolean mask to your df based on notnull() of 'Survive' column and select the cols of interest:

In [2]:
# make some data
df = pd.DataFrame(np.random.randn(5,7), columns= ['Survive', 'Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ])
df['Survive'].iloc[2] = np.NaN
df
Out[2]:
    Survive       Age      Fare  Group_Size      deck    Pclass     Title
0  1.174206 -0.056846  0.454437    0.496695  1.401509 -2.078731 -1.024832
1  0.036843  1.060134  0.770625   -0.114912  0.118991 -0.317909  0.061022
2       NaN -0.132394 -0.236904   -0.324087  0.570660  0.758084 -0.176421
3 -2.145934 -0.020003 -0.777785    0.835467  1.498284 -1.371325  0.661991
4 -0.197144 -0.089806 -0.706548    1.621260  1.754292  0.725897  0.860482

Now pass a mask to loc to take only non NaN rows:

In [3]:
xtrain = df.loc[df['Survive'].notnull(), ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]
xtrain

Out[3]:
        Age      Fare  Group_Size      deck    Pclass     Title
0 -0.056846  0.454437    0.496695  1.401509 -2.078731 -1.024832
1  1.060134  0.770625   -0.114912  0.118991 -0.317909  0.061022
3 -0.020003 -0.777785    0.835467  1.498284 -1.371325  0.661991
4 -0.089806 -0.706548    1.621260  1.754292  0.725897  0.860482

Two alternatives because... well why not?
Both drop nan prior to column slicing. That's two call rather than EdChum's one call.

one

df.dropna(subset=['Survive'])[
    ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]

two

df.query('Survive == Survive')[
    ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]

Related questions
                            
                                Downloading a file from google cloud storage inside a folder
                            
                                How to get default blue colour of matplotlib.pyplot.scatter?
                            
                                What is the default weight initializer in Keras?
                            
                                How to hash a large object (dataset) in Python?
                            
                                When will Django support Python 3.x?
                            
                                How to convert a string from CP-1251 to UTF-8?
                            
                                Exception in Thread:must be a sequence, not instance
                            
                                How to check if value is nan in unittest?
                            
                                how to discriminate based on HTTP method in django urlpatterns
                            
                                How exactly does addStretch work in QBoxLayout?
                            
                                pygame installation issue in mac os
                            
                                Python sci-kit learn (metrics): difference between r2_score and explained_variance_score?
                            
                                Python: What is the difference between math.exp and numpy.exp and why do numpy creators choose to introduce exp again
                            
                                sklearn LogisticRegression and changing the default threshold for classification
                            
                                Is there any way to clear django.db.connection.queries?
                            
                                Confused about backslashes in regular expressions [duplicate]
                            
                                How to export current notebook in HTML on Jupyter
                            
                                Matplotlib colorbar ticks on left/opposite side
                            
                                Attributes of Python module `this`
                            
                                How to decrease the density of x-ticks in seaborn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Select non-null rows from a specific column in a DataFrame and take a sub-selection of other columns

Tags:

python

pandas

People also ask

Recent Activity

Donate For Us