Pandas split data frame into multiple csv's based on column value

Tags:

I have a question very similar to this one but I need to take it a step further by saving split data frames to csv.

import pandas as pd
import numpy as np
import os

df = pd.DataFrame({ 'CITY' : np.random.choice(['PHOENIX','ATLANTA','CHICAGO', 'MIAMI', 'DENVER'], 1000),
                    'DAY': np.random.choice(['Monday','Tuesday','Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 1000),
                    'TIME_BIN': np.random.randint(1, 86400, size=1000),
                    'COUNT': np.random.randint(1, 700, size=1000)})

df['TIME_BIN'] = pd.to_datetime(df['TIME_BIN'], unit='s').dt.round('10min').dt.strftime('%H:%M:%S')
print(df)

OUTPUT:
         CITY  COUNT        DAY  TIME_BIN
0     ATLANTA    476   Thursday  12:20:00
1     PHOENIX     50   Saturday  15:40:00
2       MIAMI    250     Friday  08:20:00
3     CHICAGO    358     Monday  15:40:00
4     PHOENIX    217   Thursday  22:10:00
5       MIAMI     12   Thursday  21:40:00
6      DENVER     22     Friday  10:30:00
7     CHICAGO    645     Sunday  23:40:00
8       MIAMI    188     Sunday  08:40:00

I want to make a separate data frame for each city and save it as a .csv. The code below works but how do I do it Pythonicly without having to explicitly state each city? Real data set has about 20 cities so I don't want to repaste this 20 times. I think the code below can be done in 1-2 lines using a for loop but I don't know what it would look like. Something like "for city in df['CITY']"

Click to copy

df_phoenix = df[df['CITY'] == "PHOENIX"]
df_atlanta = df[df['CITY'] == "ATLANTA"]
df_chicago = df[df['CITY'] == "CHICAGO"]
df_phoenix.to_csv(os.getcwd() + "/data_phoenix.csv")
df_atlanta.to_csv(os.getcwd() + "/data_atlanta.csv")
df_chicago.to_csv(os.getcwd() + "/data_chicago.csv")

263

asked Mar 01 '18 13:03

Calculus

1 Answers

I think you need groupby with custom lambda function or with loop:

Click to copy

f = lambda x: x.to_csv(os.getcwd() + "/data_{}.csv".format(x.name.lower()), index=False)
df.groupby('CITY').apply(f)

Click to copy

for i, x in df.groupby('CITY'):
     x.to_csv(os.getcwd() + "/data_{}.csv".format(i.lower()), index=False)

EDIT by comment, thanks @Anton vBR:

Click to copy

for i, x in df.groupby('CITY'):
    p = os.path.join(os.getcwd(), "data_{}.csv".format(i.lower()))
    x.to_csv(p, index=False)

157

answered Nov 15 '22 08:11

jezrael

Related questions
                            
                                Python numpy equivalent of R rep and rep_len functions
                            
                                Cython compilation error "Not allowed in a constant expression"
                            
                                How to import models from one app to another app in Django?
                            
                                Python Dictionary: "in" vs "get"
                            
                                how to set the position of a tkinter window without setting the dimensions
                            
                                Passing extra arguments to scrapy.Request()
                            
                                Django DRF - What's the use of serializers?
                            
                                Conversion of image type int16 to uint8
                            
                                Unable to install nltk using pip
                            
                                Convert image to array for CNN
                            
                                Run process as admin with subprocess.run in python
                            
                                IPython Console in Spyder(Anaconda) is truncating output
                            
                                Standardization/Normalization test data in Python
                            
                                how to get covariance matrix in tensorflow?
                            
                                What's the meaning of cv2.videoCapture.release()?
                            
                                Python scikit-learn to JSON
                            
                                oauth2client.clientsecrets.InvalidClientSecretsError: Missing property "redirect_uris" in a client type of "web"
                            
                                python librosa package - How can I extract audio from spectrum
                            
                                Output a single row in pandas to an array
                            
                                Why find_packages(exclude=xxx) does not work when doing setup.py sdist?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas split data frame into multiple csv's based on column value

Tags:

python

pandas

csv

Calculus

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us