Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zip two lists together based on matching date in string

Tags:

python

I have two lists of files that I'm pulling from an FTP folder using:

sFiles = ftp.nlst(date+'sales.csv')
oFiles = ftp.nlst(date+'orders.csv')

This results with two lists looking something like:

sFiles = ['20170822_sales.csv','20170824_sales.csv','20170825_sales.csv','20170826_sales.csv','20170827_sales.csv','20170828_sales.csv']

oFiles = ['20170822_orders.csv','20170823_orders.csv','20170824_orders.csv','20170825_orders.csv','20170826_orders.csv','20170827_orders.csv']

With my real data-set, something like...

for sales, orders in zip(sorted(sFiles),sorted(oFiles)): 
     df = pd.concat(...)

Gets my desired result, but there are going to be times where something goes wrong and both files do not make it into the proper FTP folder, so I'd like some code that will create an iterable object where I can extract the matched orders and sales file name based on date.

The following works... I'm not sure what "pythonic" score I'd give it. Poor readability, but it is a comprehension, so I'd imagine there are performance gains?

[(sales, orders) for sales in sFiles for orders in oFiles if re.search(r'\d+',sales).group(0) == re.search(r'\d+',orders).group(0)]
like image 245
Yale Newman Avatar asked Mar 08 '23 00:03

Yale Newman


1 Answers

Taking advantage of the index of the pandas DataFrame:

import pandas as pd
sFiles = ['20170822_sales.csv','20170824_sales.csv','20170825_sales.csv','20170826_sales.csv','20170827_sales.csv','20170828_sales.csv']
oFiles = ['20170822_orders.csv','20170823_orders.csv','20170824_orders.csv','20170825_orders.csv','20170826_orders.csv','20170827_orders.csv']

s_dates = [pd.Timestamp.strptime(file[:8], '%Y%m%d') for file in sFiles]
s_df = pd.DataFrame({'sFiles': sFiles}, index=s_dates)

o_dates = [pd.Timestamp.strptime(file[:8], '%Y%m%d') for file in oFiles]
o_df = pd.DataFrame({'oFiles': oFiles}, index=o_dates)

df = s_df.join(o_df, how='outer')

and so:

>>> print(df)
                        sFiles               oFiles
2017-08-22  20170822_sales.csv  20170822_orders.csv
2017-08-23                 NaN  20170823_orders.csv
2017-08-24  20170824_sales.csv  20170824_orders.csv
2017-08-25  20170825_sales.csv  20170825_orders.csv
2017-08-26  20170826_sales.csv  20170826_orders.csv
2017-08-27  20170827_sales.csv  20170827_orders.csv
2017-08-28  20170828_sales.csv                  NaN
like image 75
Hazzles Avatar answered Mar 23 '23 02:03

Hazzles