Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to limit the duplicate to 5 in pandas data frames?

Tags:

python

pandas

col1= ['A','B','A','C','A','B','A','C','A','C','A','A','A']
col2= [1,1,4,2,4,5,6,3,1,5,2,1,1]

df = pd.DataFrame({'col1':col1, 'col2':col2})

for A we have [1,4,4,6,1,2,1,1], 8 items but i want to limit the size to 5 while converting Data frame to dict/list

Output:

Dict = {'A':[1,4,4,6,1],'B':[1,5],'C':[2,3,5]}
like image 673
Sunil Avatar asked Aug 20 '19 06:08

Sunil


People also ask

How to find duplicate values in a pandas Dataframe?

You can use the duplicated () function to find duplicate values in a pandas DataFrame. The following examples show how to use this function in practice with the following pandas DataFrame:

How to count the number of non-duplicates in pandas?

If you want to count the number of non-duplicates (The number of False ), you can invert it with negation ( ~ )and then call sum (): 3. Extracting duplicate rows with loc Pandas duplicated () returns a boolean Series. However, it is not practical to see a list of True and False when we need to perform some data analysis.

What are the disadvantages of pandas?

Pandas is a Python library used for analyzing and manipulating data sets but one of the major drawbacks of Pandas is memory limitation issues while working with large datasets since Pandas DataFrames (two-dimensional data structure) are kept in memory, there is a limit to how much data can be processed at a time. Dataset in use: train_dataset

How to convert pandas Dataframe to sparse Dataframe?

dataframe =pd.read_csv (‘file_name’,dtype= {‘col_1’:‘dtype_value’,‘col_2’:‘dtype_value’}) Pandas Dataframe can be converted to Sparse Dataframe which means that any data matching a specific value is omitted in the representation.


1 Answers

Use pandas.DataFrame.groupby with apply:

df.groupby('col1')['col2'].apply(lambda x:list(x.head(5))).to_dict()

Output:

{'A': [1, 4, 4, 6, 1], 'B': [1, 5], 'C': [2, 3, 5]}
like image 157
Chris Avatar answered Oct 20 '22 00:10

Chris