I have a number of data files created by many different programs. Is there a way to determine the database and version of the database that was used to create the data file.
For example, I'd like to identify which files are created from Microsoft Access, dBASE, FileMaker, FoxPro, SQLite or others.
I really just want to somehow quickly scan the files, and display information about them, including source Database and Version.
For reference, I'm using Delphi 2009.
Data in a database is analyzed using Database Management System (DBMS) that is the main source of interaction with end users, applications and the database itself. Popular database file extensions and their file formats include SQLite, DB, ACCDB, and MDB.
Open an Access database from Windows ExplorerIn Windows Explorer, navigate to the drive or folder containing the Access database file you want to open and double-click the database.
Document databases store semi-structured data and descriptions of that data in document format, usually JavaScript Object Notation (JSON). They're useful for flexible schema requirements such as those common with content management and mobile applications. Popular document databases include MongoDB and Couchbase.
Data entered into a database is stored in files (also known as tables). A file is a collection of records, each of which contains information about one person or thing. The data in the records is separated into fields which each hold just one item of data. Each field is identified by a field name.
First of all, check the file extension. Take a look at the corresponding wikipedia article, or other sites.
Then you can guess the file format from its so called "signature".
This is mostly the first characters content, which is able to identify the file format.
You've an updated list at this very nice Gary Kessler's website.
For instance, here is how our framework identify the MIME format from the file content, on the server side:
function GetMimeContentType(Content: Pointer; Len: integer;
const FileName: TFileName=''): RawUTF8;
begin // see http://www.garykessler.net/library/file_sigs.html for magic numbers
result := '';
if (Content<>nil) and (Len>4) then
case PCardinal(Content)^ of
$04034B50: Result := 'application/zip'; // 50 4B 03 04
$46445025: Result := 'application/pdf'; // 25 50 44 46 2D 31 2E
$21726152: Result := 'application/x-rar-compressed'; // 52 61 72 21 1A 07 00
$AFBC7A37: Result := 'application/x-7z-compressed'; // 37 7A BC AF 27 1C
$75B22630: Result := 'audio/x-ms-wma'; // 30 26 B2 75 8E 66
$9AC6CDD7: Result := 'video/x-ms-wmv'; // D7 CD C6 9A 00 00
$474E5089: Result := 'image/png'; // 89 50 4E 47 0D 0A 1A 0A
$38464947: Result := 'image/gif'; // 47 49 46 38
$002A4949, $2A004D4D, $2B004D4D:
Result := 'image/tiff'; // 49 49 2A 00 or 4D 4D 00 2A or 4D 4D 00 2B
$E011CFD0: // Microsoft Office applications D0 CF 11 E0 = DOCFILE
if Len>600 then
case PWordArray(Content)^[256] of // at offset 512
$A5EC: Result := 'application/msword'; // EC A5 C1 00
$FFFD: // FD FF FF
case PByteArray(Content)^[516] of
$0E,$1C,$43: Result := 'application/vnd.ms-powerpoint';
$10,$1F,$20,$22,$23,$28,$29: Result := 'application/vnd.ms-excel';
end;
end;
else
case PCardinal(Content)^ and $00ffffff of
$685A42: Result := 'application/bzip2'; // 42 5A 68
$088B1F: Result := 'application/gzip'; // 1F 8B 08
$492049: Result := 'image/tiff'; // 49 20 49
$FFD8FF: Result := 'image/jpeg'; // FF D8 FF DB/E0/E1/E2/E3/E8
else
case PWord(Content)^ of
$4D42: Result := 'image/bmp'; // 42 4D
end;
end;
end;
if (Result='') and (FileName<>'') then begin
case GetFileNameExtIndex(FileName,'png,gif,tiff,tif,jpg,jpeg,bmp,doc,docx') of
0: Result := 'image/png';
1: Result := 'image/gif';
2,3: Result := 'image/tiff';
4,5: Result := 'image/jpeg';
6: Result := 'image/bmp';
7,8: Result := 'application/msword';
else begin
Result := RawUTF8(ExtractFileExt(FileName));
if Result<>'' then begin
Result[1] := '/';
Result := 'application'+LowerCase(Result);
end;
end;
end;
end;
if Result='' then
Result := 'application/octet-stream';
end;
You can use a similar function, from the GAry Kessler's list.
There are lots of database engines with hundreds (if not thousands) of versions and formats. (Binary, CSV, XML...) Many of them are encrypted to protect the content. It is quite "impossible" to identify every database and every format and it is a subject of constant changes.
So first of all you have to limit your task to a list of database engines you want to scan. Thats what i would do...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With