Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete special characters from multiple csv files using batch file

I want to delete all the special characters in my csv file using a batch file. My csv file has one column of only keywords to be entered in google

For example 1.Ecommerce 2.dentist Melbourne cbd? 3.dentists Melbourne % 4.best dentist in Melbourne!

Sometimes I can have Aracbic/Chinese Characters as well and so on.

Here When I add these files to GoogleAdwords-Keyword Planner, it shows me an error, on ignoring error i get wrong no. of hits for keyword and to avoid error i need to remove all the special characters from my csv file.

I have Hundreds of csv files and want to save the updated(Without special characters) file to the existing file.

I tried

@echo off
set source_folder=C:\Users\Username\Documents\iMacros\Datasources\a
set target_folder=C:\Users\Username\Documents\iMacros\Datasources\keyfords-csv-file
if not exist %target_folder% mkdir %target_folder%

for /f %%A in ('dir /b %source_folder%\*.csv') do (
    for /f "skip=1 tokens=1,2* delims=," %%B in (%source_folder%\%%A) do (
    echo %%B>>%target_folder%\%%A
    )
)

timeout /t 20

But ended up Deleting all the records from csv file.

Is there anyway by which i can either

1.Accept only Standard Characters which would be from A-Z, a-z, and 0-9.

2.Or Delete all the string where I can put special characters in that string. Like string1="?%!@#$^&*<>"

3.Or is there anyway by which i can mention in csv file to accept only Standard English Characters Is there any way to achieve this using a batch file or any framework?

Thanks

like image 726
Penny Avatar asked Nov 12 '14 06:11

Penny


1 Answers

I think this is much cleaner in Powershell.

$sourceFolder = "C:\Users\Username\Documents\iMacros\Datasources\a"
$targetFolder = "C:\Users\Username\Documents\iMacros\Datasources\keyfords-csv-file"
MkDir $targetFolder -ErrorAction Ignore

$fileList = Dir $sourceFolder -Filter *.csv 

ForEach($file in $fileList)
{
    $file | Get-Content | %{$_ -replace '[^\w\s,\"\.]',''} | Set-Content -Path "$targetFolder\$file"
}

I take every file from the source folder, get the contents, replace any character that is not wanted, and save it to another file. I use a little regex right in the middle '[^\w\s,\"\.]' with the replace command. The carrot ^ is a not match operator. So anything that does not match a word character \w, space character \s, a coma ,, double quote \", or a period \.

Someone may find a better regex for your needs, but I think you get the idea.

like image 96
kevmar Avatar answered Oct 13 '22 07:10

kevmar