Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performing str.split on a column in a dataframe returns a SettingWithCopyWarning

I am attempting to split a column and keep only the third item as the column value using the following

df1['gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]

I have also tried these variations

df1['gene_name'] = df1.iloc[:,'gene_name'].str.split(';', expand=True)[2]

df1['gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]

df1['gene_name'] = df1['gene_name'].str.split(';', expand=True)[2]

df1['gene_name'] = df1.gene_name.str.split(';', expand=True)[2]

But it always returns this warning

find_target_genes.py:19: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1['gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]

I have also tried using 4 (column index) instead of gene_names but this results in an error.

How can I make this work? I've looked through the documentation but I don't think I am fully understanding it since I can't figure out whats wrong.

Here is an example of 2 of the columns I am trying to split (yes this is all in one column):

ID "A" ; version "B" ; name "C" ; source "D' ;  transcript "C"
ID "A1" ; version "B1" ; name "C1" ; source "D1" ;  transcript "C1"

I would like the column to say name "C" only and get rid of the rest

like image 977
keenan Avatar asked Jul 02 '26 06:07

keenan


1 Answers

The problem is not on the right side of the assignment, it is in the left one. You are using df1['gene_name'] instead of df1.loc[:,'gene_name'] as is recommended on the User Guide. Using your assignment, "it’s very hard to predict whether it will return a view or a copy". Depending on the "memory layout of the array" bad things can happen. So, you should be doing:

df1.loc[:,'gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]
like image 96
Fergui Avatar answered Jul 04 '26 19:07

Fergui



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!