So I found out that the easiest way of grouping and counting elements is through itertools
.
I have this list of "Employee Departments" (e.g. Accounting, Purchasing, Marketing, etc.) and it's over 500. A sample of which is:
# employee number, first name, last name, department, rate, age, birthdate
201601005,Raylene,Kampa,Purchasing,365,15,12/19/2001,;
200909005,Flo,Bookamer,Human Resources,800,28,12/19/1957,;
200512016,Jani,Biddy,Human Resources,565,20,8/7/1966,;
199806004,Chauncey,Motley,Admin,450,24,3/1/2000
What I intend to do is count all employees under a certain department then remove the duplicates. It should be looking like (for example):
Accounting: 97
Marketing: 34
Purchasing: 45
The list is implied as a module so I can't use CSV to read it. The following is my code for the itertools
:
import empDataLT as x
from itertools import groupby
#Departments
def dept():
empDept = list() #converting empDataLT to list
for em in x.a:
empEm = em.strip().split(",")
empDept.append(empEm)
e = sorted(empDept, key=lambda x: x[3]) #sort data alphabetical
b = []
c = []
for s in e:
new_b = []
new_c = []
for value, repeated in groupby(s[3]):
new_b.append(value)
new_c.append(sum(1 for _ in repeated))
b.append(new_b)
c.append(new_c)
print(b)
print(c)
Where the import empDataLT
is the 500 record list implied as module. However, this code produces the following result:
[['A', 'c', 'o', 'u', 'n', 't', 'i', 'n', 'g'], [['A', 'c', 'o', 'u', 'n', 't', 'i', 'n', 'g'],
[[1, 2, 1, 1, 1, 1, 1, 1, 1], [1, 2, 1, 1, 1, 1, 1, 1, 1],
Yes, apparently it counts the letters of the departments instead. I'm still learning Python so I am not quite sure how to fix it or any workarounds for this. Thank you in advance! Cheers.
PS: the empData is a string, but should be considered as a list.
One more thing if it's not too much to ask, this also requires it to check which department has the highest number of employees. But this is not that important. I can look for this. :D
Using groupby is fine, but needs sorting.
Using a collections.defaultdict
avoids sorting altogether:
s = """201601005,Raylene,Kampa,Purchasing,365,15,12/19/2001,;
200909005,Flo,Bookamer,Human Resources,800,28,12/19/1957,;
200512016,Jani,Biddy,Human Resources,565,20,8/7/1966,;
199806004,Chauncey,Motley,Admin,450,24,3/1/2000"""
data = [ i.strip().split(",") for i in s.split(";")]
from collections import defaultdict
grpd_data = defaultdict(list)
for d in data:
grpd_data[d[3]].append(d)
print(grpd_data)
print()
# sort by lenght of list descending and enumerate it:
for idx,(key,value) in enumerate(sorted(grpd_data.items(), key=lambda i:-len(i[1])), 1):
print(idx,key,value,len(value))
Output (manually formatted):
defaultdict(<class 'list'>, {
'Purchasing': [['201601005', 'Raylene', 'Kampa', 'Purchasing', '365', '15', '12/19/2001', '']],
'Human Resources': [[' 200909005', 'Flo', 'Bookamer', 'Human Resources', '800', '28', '12/19/1957', ''],
[' 200512016', 'Jani', 'Biddy', 'Human Resources', '565', '20', '8/7/1966', '']],
'Admin': [[' 199806004', 'Chauncey', 'Motley', 'Admin', '450', '24', '3/1/2000']]})
# with counts and sorted
1 Human Resources [[' 200909005', 'Flo', 'Bookamer', 'Human Resources', '800', '28', '12/19/1957', ''],
[' 200512016', 'Jani', 'Biddy', 'Human Resources', '565', '20', '8/7/1966', '']] 2
2 Purchasing [['201601005', 'Raylene', 'Kampa', 'Purchasing', '365', '15', '12/19/2001', '']] 1
3 Admin [[' 199806004', 'Chauncey', 'Motley', 'Admin', '450', '24', '3/1/2000']] 1
Edit - bigger data:
big = s
for _ in range(200):
big += ";"+s
s = big
data = [ i.strip().split(",") for i in s.split(";")]
from collections import defaultdict
gr = defaultdict(list)
for d in data:
gr[d[3]].append(d)
for idx,(key,value) in enumerate(sorted(gr.items(), key=lambda i:-len(i[1])),1):
print(idx, len(value))
Output:
1 402
2 201
3 201
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With