I am building functions to help me load data from the web. The problem I am trying to solve as far as loading data is that column names are different depending on the source. For example, Yahoo Finance data column headings look like this Open, High, Low, Close, Volume, Adj Close. Quandl.com will have data sets that have DATE,VALUE,date,value etc. The all upper case and lowercase throws everything off and Value and Adj. Close for the most part mean the same thing. I want to associate columns with different names but the same meaning to one value. For example Adj. Close and value both = AC; Open, OPEN, and open all = O.
So I have a Csv file ("Functions//ColumnNameChanges.txt") that stores dict() keys and values of column names.
Date,D
Open,O
High,H
and then I wrote this function to populate my dictionary
def DictKeyValuesFromText ():
Dictionary = {}
TextFileName = "Functions//ColumnNameChanges.txt"
with open(TextFileName,'r') as f:
for line in f:
x = line.find(",")
y = line.find("/")
k = line[0:x]
v = line[x+1:y]
Dictionary[k] = v
return Dictionary
This is the output of print(DictKeyValuesFromText())
{'': '', 'Date': 'D', 'High': 'H', 'Open': 'O'}
The next function is where my problems are at
def ChangeColumnNames(DataFrameFileLocation):
x = DictKeyValuesFromText()
df = pd.read_csv(DataFrameFileLocation)
for y in df.columns:
if y not in x.keys():
i = input("The column " + y + " is not in the list, give a name:")
df.rename(columns={y:i})
else:
df.rename(columns={y:x[y]})
return df
df.rename is not working. This is the output I get print(ChangeColumnNames("Tvix_data.csv"))
The column Low is not in the list, give a name:L
The column Close is not in the list, give a name:C
The column Volume is not in the list, give a name:V
The column Adj Close is not in the list, give a name:AC
Date Open High Low Close Volume \
0 2010-11-30 106.269997 112.349997 104.389997 112.349997 0
1 2010-12-01 99.979997 100.689997 98.799998 100.689997 0
2 2010-12-02 98.309998 98.309998 86.499998 86.589998 0
The columns names should be D, O, H, L, C, V. I am missing something any help would be appreciated.
Using rename() function Pandas has a built-in function called rename() to change the column names.
You can use df. replace({"Courses": dict}) to remap/replace values in pandas DataFrame with Dictionary values. It allows you the flexibility to replace the column values with regular expressions for regex substitutions.
Pandas rename() method is used to rename any index, column or row.
df.rename
works just fine, but it is not inplace by default. Either re-assign its return value or use inplace=True
. It expects a dictionary with old names as keys and new names as values.
df = df.rename(columns = {'col_a': 'COL_A', 'col_b': 'COL_B'})
or
df.rename(columns = {'col_a': 'COL_A', 'col_b': 'COL_B'}, inplace=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With