I have a problem with appending of dataframe. I try to execute this code
df_all = pd.read_csv('data.csv', error_bad_lines=False, chunksize=1000000) urls = pd.read_excel('url_june.xlsx') substr = urls.url.values.tolist() df_res = pd.DataFrame() for df in df_all: for i in substr: res = df[df['url'].str.contains(i)] df_res.append(res)
And when I try to save df_res
I get empty dataframe. df_all
looks like
ID,"url","used_at","active_seconds" b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:25,1 b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:31,30 f85ce4b2f8787d48edc8612b2ccaca83,"4pda.ru/forum/index.php?showtopic=634566&view=getnewpost",2015-10-01 00:01:49,2 d3b0ef7d85dbb4dbb75e8a5950bad225,"shop.mts.ru/smartfony/mts/smartfon-smart-sprint-4g-sim-lock-white.html?utm_source=admitad&utm_medium=cpa&utm_content=300&utm_campaign=gde_cpa&uid=3",2015-10-01 00:03:19,34 078d388438ebf1d4142808f58fb66c87,"market.yandex.ru/product/12675734/spec?hid=91491&track=char",2015-10-01 00:03:48,2 d3b0ef7d85dbb4dbb75e8a5950bad225,"avito.ru/yoshkar-ola/telefony/mts",2015-10-01 00:04:21,4 d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:25,1 d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:26,9
and urls
looks like
url shoppingcart.aliexpress.com/order/confirm_order ozon.ru/?context=order_done&number= lk.wildberries.ru/basket/orderconfirmed lamoda.ru/checkout/onepage/success/quick mvideo.ru/confirmation?_requestid= eldorado.ru/personal/order.php?step=confirm
When I print res
in a loop it doesn't empty. But when I try print in a loop df_res
after append, it return empty dataframe. I can't find my error. How can I fix it?
The syntax for using append on a Series is very similar to the dataframe syntax. You type the name of the first Series, and then . append() to call the method. Then inside the parenthesis, you type the name of the second Series, which you want to append to the end of the first.
When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one. It will automatically detect whether the column names are the same and will stack accordingly. axis=1 will stack the columns in the second DataFrame to the RIGHT of the first DataFrame.
Use pandas. concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another.
append was deprecated because: "Series. append and DataFrame. append [are] making an analogy to list. append, but it's a poor analogy since the behavior isn't (and can't be) in place.
If you look at the documentation for pd.DataFrame.append
Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns.
(emphasis mine).
Try
df_res = df_res.append(res)
Incidentally, note that pandas isn't that efficient for creating a DataFrame by successive concatenations. You might try this, instead:
all_res = [] for df in df_all: for i in substr: res = df[df['url'].str.contains(i)] all_res.append(res) df_res = pd.concat(all_res)
This first creates a list of all the parts, then creates a DataFrame from all of them once at the end.
df_res = pd.DataFrame(data = None, columns= df.columns) all_res = [] d1 = df.ix[index-10:index-1,] #it will take 10 rows before i-th index all_res.append(d1) df_res = pd.concat(all_res)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With