I want to produce an Excel workbook that will contain proprietary formulas and other intellectual property. I will most likely produce this workbook in Visual Studio 2010 with C#.
How protected is the code within these projects? Do the same problems with reflection in C# still apply to these workbook projects? If so, are there obfuscation tools specifically for these types of workbooks?
To add to this question - what about good 'ole VBA? Is there a way to protect that code too if we were to go the VBA route?
To protect your code, open the Excel Workbook and go to Tools>Macro>Visual Basic Editor (Alt+F11). Now, from within the VBE go to Tools>VBAProject Properties and then click the Protection page tab and then check "Lock project from viewing" and then enter your password and again to confirm it.
To lock whole workbook from opening, Thisworkbook. password option can be used in VBA. If you want to Protect Worksheets, then you have to first Lock the cells with option Thisworkbook. sheets.
The entire Microsoft Office suite has a password protection tool you can use to secure your Excel spreadsheets, Word documents, PowerPoint presentations, and so on. In Excel, head to File > Info. Select Protect Workbook, then Encrypt with Password from the dropdown menu.
As other answers have suggested, VBA security in Excel 2003 (.xls) is flawed; a password is only a little annoyance, most often than not for the developper himself. IMHO you're absolutely better off with VSTO, even if VBA security in Excel 2007+ (.xlsx) is much better, not to mention the VBA IDE which hasn't seen improvements in centuries (although no so true with the right VBE add-in...).
A solution could be to write the sensible code in a separate library which your VSTO project calls; instead of intricate formulas you could expose functions which would appear as "black boxes" (i.e. an abstraction of the formula's functionality); how it's implemented is just not available to anyone without the source code), and without the dll the workbook couldn't be calculated (would be #NAME?
all over the place).
For example, instead of =some proprietary formula returning a Smurf
, users would see =Smurf(SomeTableColumnName, SomeRangeName, SomeRangeReference, SomeCellReference)
. Of course this comes at a performance cost, so I would only do this for stuff that's really important to protect.
I think a user that manages to extract the proprietary formulas from such an on-the-side library, deserves to leave with them - between that and Alt+F11, there's quite a step; the proprietary code is as "protected" as any other .net assembly is.
UPDATE
Still without using any 3rd-party libraries, if the proprietary code does not need Excel interoperability, a perhaps better solution could be to create the function in VBA (i.e. leave the Excel interop part within Excel) and only pass primitive types (VBA Integer
=> .net Int16
; VBA Long
=> .net Int32
, strings, etc.) to a COM-visible library which doesn't even need to use Excel interop; it will be COM interop instead of Excel interop and the performance hit would be significantly reduced, though still present (it remains .net managed code "talking" to unmanaged COM).
The VBA code's responsibility would be to read and write to/from the worksheets, and the COM-visible library's role would be to implement the calculations to be made out of the values passed by the VBA code; of course the VBA code would be easily accessible to the user (Alt+F11), but again they couldn't access the actual calculation implementation. And then the VBA code could be password-protected, and the calculation results without the DLL would be either 0
's or #VALUE!
, depending on how the VBA code is implemented.
The key is to leave Excel-specifics in Excel and have the .net library pick up where it can without referencing Excel interop assemblies - for example, the VBA function would take a cell reference, grab its value (perhaps validate it) and pass it to the .net assembly, which would return another primitive type that VBA would return to Excel.
Another plus is that the library code with the proprietary calculations could be reused, if you ever were to "convert" the Excel workbook into, say, a Web application.
For future reference: here are 2 new innovative products to help solve your exact problem:
1) https://HiveLink.io: - Using HiveLink you can create a "lightweight" version of your spreadsheet, which you give to your users but it doesn't contain any of your sensitive VBA or calculation formulas. You strip out the intellectual property from the lightweight spreadsheet, and send invitations for your users to download this spreadsheet. When your users enter their input data, HiveLink delivers the data to your original spreadsheet to process the data and then returns the results back to the user automatically.
The good thing about this approach is that it isn't possible to reverse engineer with .NET reflection, or even assembly code - because you completely remove the code from their computer. This is the safest option, and great if your model suits the input/output roundtrip design - not so great if you require client-side interactive code that happens instantly. If you do require fast client-side interactive code, typically this is less sensitive, and you can use workbook protection, or even compile the basic code using something like DoneEx Excel Compiler, and have protection using a combination of Excel Compiler and HiveLink.
2) http://FCell.io: This allows you to build .NET code directly into the spreadsheet. It even comes with a code editor window in the spreadsheet, kinda like replacing the existing VBA code editor with a .NET code editor. The object model in the code editor is a little less comprehensive than the normal one, but you can also create Task Panes and embed them into the workbook for distribution so that your users don't even need to install anything! I'm not sure how safe your code is being embedded into the workbook, but I'm 100% sure people can get to your code if it's important enough to them.
Further comments Re Mat's Mug helpful response:
Re: Performance cost - actually you might notice significant performance gains by implementing things in .NET, especially if you harness multi-threading, which Excel doesn't do at all with UDFs or macros. I've seen gains of 100x re-writing some of my clients' code in .NET libraries (for heavy processing calculations). You might also find it much easier to maintain and improve .NET code than VBA. I have done a lot of conversion of VBA code and functionality to .NET code and have never had a noticeable performance issue with it when done properly.
Re: Users deserving to get access by reflecting .NET libraries?? - I totally disagree with this. It is extremely easy to get access to .NET code using reflection, maybe even easier than cracking Excel's weak passwords. If you really have sensitive calculations you want to protect then it's best to either use something like HiveLink to completely remove the possibility of reverse engineering, or to use obfuscation to make it much harder/extremely annoying. For obfuscation I use CryptoObfuscator($) and Dotfuscator($$$). I wouldn't share my clients' sensitive model VBA code in .NET libraries for excel without this. I also sometimes use CryptoLicensing to allow them to control who can access their DLL functionality by creating licenses for each user.
Re: Leaving VBA code responsible for writing/reading to/from the worksheet - I'm also highly against this. I would recommend leaving as little code in VBA as possible and doing a high level call to .NET, leaving the reading/writing to the .NET plugin. Use named ranged in your worksheet to identify your inputs/outputs and data locations, and then define those named ranges as constants in your .NET library. In your library you can very easily read from ranges and write to ranges. It is much cleaner and easier to maintain this way.
When building any extension library for Excel where you want to call external functionality from VBA - you have to make your library COM Visible. There are a few ways to do this: I highly recommend avoiding having to register your library with the windows COM list using regsvr32 - avoid it at all costs!
The best way to register your library is to load it using ExcelDna, which allows you to load your DLL dynamically every time you launch Excel, without having to permanently define each class with a COM interface in the registry (nightmare when you update versions). You end up creating an XLL file, which can have the DLL packed inside - and you can tell Excel to load the XLL each time it launches by putting a key in the registry. The benefit is that this doesn't require you to store the whole COM interface in the registry, so it is very easy to manage new versions.
Therefore, your VBA code might be something like:
Sub Button_Click()
Call ThisWorkbook.EnsureLibraryInitialized
Call ThisWorkbook.MyLibrary.ReadRangesAndWriteResults(someInputParam)
End Sub
Note that VBA would look for this method ReadRangesAndWriteResults at run time. If you expose the method for compile time in VBA then you'd have to register its COM interface signature in the registry - worth avoiding at all costs!!!
ThisWorkbook module:
Public MyLibrary as Object
public sub Workbook_Open()
EnsureLibraryInitialized
End Sub
public Sub EnsureLibraryInitialized()
Dim addin As COMAddIn
If MyLibrary is Nothing Then
For Each addin In Application.COMAddIns
If InStr(addin.Description, "MyLibrary.COMHelper") Then
If addin.object Is Nothing Then
'complain it can't connect to library, tell user to reinstall it
Else
Set MyLibrary = addin.object
Exit For
End If
Next
End if
End Sub
Re: Excel Interop and .NET libraries - Good points Mat's Mug, this can be a nightmare when using the normal Microsoft.Office.Interop.Excel. Because you are accessing native code from a managed .NET environment you have to make sure your .NET COM wrapper objects get cleaned up properly or you will get hanging/zombie processes of Excel even after you close Excel and it looks like it has disappeared/closed (look in Task Manager, they might still be there!)
The best way I've found to handle this is to use NetOffice, which is basically a clone of the Excel interop model but created to safely manage all of these problems for you. Best to also be aware of good safe handling of COM objects, eg here and here.
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With