I have a set of csv files, and for each file, the first line contains column names for a data set. Some csv files have all upper case column names, others have all lower case column names. My question is how do I change every csv file so that the first line of each file (aka the column names) displays as all uppercase strings in each column?
My attempt at it was the following: first, I manually checked all the files to see which ones had column names that were uppercase and lowercase, then I ran the following commands:
head -1 uppercase.csv > header.csv
#repeated all commands below for all lowercase files individually
sed -i 1d lowercase.csv
cat header.csv lowercase.csv > lowercase_new.csv
rm lowercase.csv
mv lowercase_new.csv lowercase.csv
I want to know if there is a more automated way to do this, without going through each file manually.
Dataset1.csv
a b c
x x x
Dataset2.csv
A B C
y y y
How do I make Dataset1.csv look like the following?
A B C
x x x
Following simple awk
may help you in same too.
awk 'NR==1{$0=toupper($0)} 1' Input_file
Explanation:
NR==1
: checking condition here if it is first line then do following:
$0=toupper($0)
Making current line's value as UPPER CASE and saving to it.
1
awk
works on method of condition and then action so I am making condition TRUE here and not mentioning any action here so by default printing of current line will happen.
In case you want to save the output into Input_file itself then append following to above solution > temp_file && mv temp_file Input_file
. Where Input_file is your data file which you want to change or pass to awk
.
You can do it with sed:
$ sed -i -e '1 s/\(.*\)/\U\1/' input.csv
Just to point out the obvious, your commands can perfectly well be put into a script and executed on a set of files.
#!/bin/sh
head -n 1 uppercase.csv > header.csv
for lowercase; do
sed -i 1d "$lowercase"
cat header.csv "$lowercase" > "$lowercase"_new
rm "$lowercase"
mv "$lowercase"_new "$lowercase"
done
rm -f header.csv
Save it as headerfix
, then make it executable with chmod +x ./headerfix
, then run it with ./headerfix lower1.csv lower2.csv lower3.csv
to fix the headers in those three files.
A proper production script would use properly randomized temporary file names (use mktemp
) and take care to clean them out even if it was interrupted, too (use trap
); and I guess the entire loop body could be refactored into a single sed
script (in which case, no loop necessary) but you already have good answers which do that elegantly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With