Python Pandas Remove Duplicate Cells issue

Question

I am fetching data from a big DB, and I am trying to represent. I want to use pandas to trim it however, as the SQL DB query uses several joins and the returned table actually has a lot of duplicate cells. I found the following question:

Python Pandas Remove Duplicate Cells - Keep the rows

This does the job for me, however with some bugs. I think I might be using it wrong. Can someone explain to me a bit how it works?

I my case (see the spinet) I use a column called Product as the main axis of the loc (if this is correct to be said like that). Afterwards I am trying to filter the duplicate cells in ALL the other columns from the dataframe and replace with empty string:

df.loc[df.Product.duplicated(), ['Type', 'ID', 'Device', 'MID',
                                'Technical Description', 'Zx', 'Package', 'Connector',
                                'Wavelength', 'Chip', 'Code', 'FType',
                                'Orientation']] = ' '

This removes cells from the Device column which actually are not duplicates. I really can't get it. I apologize if this is stupid!

Thanks a lot!

EDIT: Here is the code for the dataframe and DB connections:

with connect(database='measurements') as connSQL:    # Use a context handler for the connection
    pdQuery = pd.read_sql_query(''' SQL CODE ''', connSQL)

df = pd.DataFrame(pdQuery, columns=['Product', 'Type', 'ID', 'Device', 'MID',
                                    'Technical Description', 'Zx', 'Package', 'Connector',
                                    'Wavelength', 'Chip', 'Code', 'FType',
                                    'Orientation'])

Output for only 1 product selected:

Without cell filtering:
With cell filtering: The code described above.

Is there a way to selectively remove cells? Should I apply this to several columns in order to leave only the unique values per column?

jezrael · Accepted Answer

If need replace duplicated values to empty strings for each column separately use DataFrame.mask with Series.duplicated per each column by DataFrame.apply:

df = df.mask(df.apply(lambda x: x.duplicated()), '')

Python Pandas Remove Duplicate Cells issue

Tags:

python

pandas

pr07ecH70r

1 Answers

jezrael

Recent Activity

Donate For Us

Python Pandas Remove Duplicate Cells issue

Tags:

python

pandas

pr07ecH70r

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us