I am attempting to split a column and keep only the third item as the column value using the following
df1['gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]
I have also tried these variations
df1['gene_name'] = df1.iloc[:,'gene_name'].str.split(';', expand=True)[2]
df1['gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]
df1['gene_name'] = df1['gene_name'].str.split(';', expand=True)[2]
df1['gene_name'] = df1.gene_name.str.split(';', expand=True)[2]
But it always returns this warning
find_target_genes.py:19: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df1['gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]
I have also tried using 4 (column index) instead of gene_names but this results in an error.
How can I make this work? I've looked through the documentation but I don't think I am fully understanding it since I can't figure out whats wrong.
Here is an example of 2 of the columns I am trying to split (yes this is all in one column):
ID "A" ; version "B" ; name "C" ; source "D' ; transcript "C"
ID "A1" ; version "B1" ; name "C1" ; source "D1" ; transcript "C1"
I would like the column to say name "C" only and get rid of the rest
The problem is not on the right side of the assignment, it is in the left one. You are using df1['gene_name'] instead of df1.loc[:,'gene_name'] as is recommended on the User Guide. Using your assignment, "it’s very hard to predict whether it will return a view or a copy". Depending on the "memory layout of the array" bad things can happen. So, you should be doing:
df1.loc[:,'gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With