Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Files with .sql extension identified as binary in Mercurial [duplicate]




Possible Duplicate:
Why does Mercurial think my SQL files are binary?

I generated a complete set of scripts for the stored procedures in a database. When I created a Mercurial repository and added these files they were all added as binary. Obviously, I still get the benefits of versioning, but lose a lot of efficiency, 'diff'ing, etc... of text files. I verified that these files are indeed all just text.

Why is it doing this?

What can I do to avoid it?

IS there a way to get Hg to change it mind about these files?

Here is a snippet of changeset log:

   496.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFindCustomerByMatchCode.StoredProcedure.sql has changed
   497.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFindUnreconcilableChecks.StoredProcedure.sql has changed
   498.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFixBadLabelSelected.StoredProcedure.sql has changed
   499.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFixCCOPL.StoredProcedure.sql has changed
   500.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFixCCOrderMoneyError.StoredProcedure.sql has changed

Thanks in advance for your help Jim

like image 712
Jim Reineri Avatar asked Sep 16 '10 09:09

Jim Reineri

2 Answers

In fitting with Mercurial's views on binary files, it does not actually track file types, which means that there is no way for a user to mark a file as binary or not binary.

As tonfa and Rudi mentioned, Mercurial determines whether a file is binary or not by seeing if there is a NUL byte anywhere in the file. In the case of UTF-[16|32] files, a NUL byte is pretty much guaranteed.

To "fix" this, you would have to ensure that the files are encoded with UTF-8 instead of UTF-16. Ideally, your database would have a setting for Unicode encoding when doing the export. If that's not the case, another option would be to write a precommit hook to do it (see How to convert a file to UTF-8 in Python for a start), but you would have to be very careful about which files you were converting.

like image 88
tghw Avatar answered Nov 09 '22 04:11


I know it's a bit late, but I was evaluating Kiln and came across this problem. After discussion with the guys at Fogbugz who couldn't give me an answer other than "File/Save As" from SSMS for every *.sql file (very tedious), I decided to have a look at writing a quick script to convert the *.sql files.

Fortunately you can use one Microsoft technology (Powershell) to (sort of) overcome an issue with another Microsoft technology (SSMS) - using Powershell, change to the directory that contains your *.sql files and then copy and paste the following into the Powershell shell (or save as a .ps1 script and run it from Powershell - make sure to run the command "Set-ExecutionPolicy RemoteSigned" before trying to run a .ps1 script):

function Get-FileEncoding
  [CmdletBinding()] Param (
  [Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)] [string]$Path

  [byte[]]$byte = get-content -Encoding byte -ReadCount 4 -TotalCount 4 -Path $Path

  if ( $byte[0] -eq 0xef -and $byte[1] -eq 0xbb -and $byte[2] -eq 0xbf )
  { Write-Output 'UTF8' }
  elseif ($byte[0] -eq 0xfe -and $byte[1] -eq 0xff)
  { Write-Output 'Unicode' }
  elseif ($byte[0] -eq 0xff -and $byte[1] -eq 0xfe)
  { Write-Output 'Unicode' }
  elseif ($byte[0] -eq 0 -and $byte[1] -eq 0 -and $byte[2] -eq 0xfe -and $byte[3] -eq 0xff)
  { Write-Output 'UTF32' }
  elseif ($byte[0] -eq 0x2b -and $byte[1] -eq 0x2f -and $byte[2] -eq 0x76)
  { Write-Output 'UTF7'}
  { Write-Output 'ASCII' }

$files = get-ChildItem "*.sql"
foreach ( $file in $files )
$encoding = Get-FileEncoding $file
If ($encoding -eq 'Unicode')
    (Get-Content "$file" -Encoding Unicode) | Set-Content -Encoding UTF8 "$file"

The function Get-FileEncoding is courtesy of http://poshcode.org/3227 although I had to modify it slightly to cater for UC2 little endian files which SSMS seems to have saved these as. I would recommend backing up your files first as it overwrites the original - you could, of course, modify the script so that it saves a UTF-8 version of the file instead e.g. change the last line of code to say:

(Get-Content "$file" -Encoding Unicode) | Set-Content -Encoding UTF8 "$file.new"

The script should be easy to modify to traverse subdirectories as well.

Now you just need to remember to run this if there are any new *.sql files, before you commit and push your changes. Any files already converted and subsequently opened in SSMS will stay as UTF-8 when saved.

like image 28
misterjaytee Avatar answered Nov 09 '22 04:11
