I'm doing a machine learning computations having two dataframes - one for factors and other one for target values. I have to split both into training and testing parts. It seems to me that I've found the way but I'm looking for more elegant solution. Here is my code:
import pandas as pd
import numpy as np
import random
df_source = pd.DataFrame(np.random.randn(5,2),index = range(0,10,2), columns=list('AB'))
df_target = pd.DataFrame(np.random.randn(5,2),index = range(0,10,2), columns=list('CD'))
rows = np.asarray(random.sample(range(0, len(df_source)), 2))
df_source_train = df_source.iloc[rows]
df_source_test = df_source[~df_source.index.isin(df_source_train.index)]
df_target_train = df_target.iloc[rows]
df_target_test = df_target[~df_target.index.isin(df_target_train.index)]
print('rows')
print(rows)
print('source')
print(df_source)
print('source train')
print(df_source_train)
print('source_test')
print(df_source_test)
---- edited - solution by unutbu (midified) ---
np.random.seed(2013)
percentile = .6
rows = np.random.binomial(1, percentile, size=len(df_source)).astype(bool)
df_source_train = df_source[rows]
df_source_test = df_source[~rows]
df_target_train = df_target[rows]
df_target_test = df_target[~rows]
DataFrame - equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
MatPlotLib with Python Set the figure size and adjust the padding between and around the subplots. Create two Pandas dataframes, df1 and df2, of two-dimensional, size-mutable, potentially heterogeneous tabular data. Plot df1 and df2 using plot() method. To display the figure, use show() method.
Checking If Two Dataframes Are Exactly SameBy using equals() function we can directly check if df1 is equal to df2. This function is used to determine if two dataframe objects in consideration are equal or not.
Below you can find my solution, which doesn't involve any extra variables.
.sample
method to get sample of your data.index
method on sample, to get indexesslice()
ing by index for second dataframe
E.g. Let's say you have X and Y and you want to get 10 pieces sample on each. And it should be same samples, of course
X_sample = X.sample(10)
y_sample = y[X_sample.index]
I like the Alexander answer but I will add an index reset before sampling. The full code:
# index reset
X.reset_index(inplace=True, drop=True)
y.reset_index(inplace=True, drop=True)
# sampling
X_sample = X.sample(10)
y_sample = y[X_sample.index]
Reset of the index is used to not have problem with matching.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With