I'm desperate to find any DFS which supports Windows. The only such DFS is Hadoop HDFS but it's very hard to deploy it other big number of Windows machines because it requires Cygwin + SSH.
Almost all DFS systems work only on Linux and only one (HDFS) runs on Windows.
I would be very grateful if somebody will be able to point me to other DFS with Windows support.
From DFS I need ability to load balance files across DFS nodes, compression and multi language API to work with DFS (I don't need to mount DFS).
HDFS employs a NameNode and DataNode architecture to implement a distributed file system that provides high-performance access to data across highly scalable Hadoop clusters.
HDFS is a distributed file system that handles large data sets running on commodity hardware. It is used to scale a single Apache Hadoop cluster to hundreds (and even thousands) of nodes. HDFS is one of the major components of Apache Hadoop, the others being MapReduce and YARN.
A distributed file system (DFS) differs from typical file systems (i.e., NTFS and HFS) in that it allows direct host access to the same file data from multiple locations. Indeed, the data behind a DFS can reside in a different location from all of the hosts that access it.
After the successful accessing of data, the client machine can interconnect with the file systems within the specified parameters. Difference between HDFS & NFS : NFS does not have any built-in fault-tolerance but HDFS was designed to survive failures as it has fault-tolerance or replication.
There is DFS from Microsoft itself, it's in Windows Server (if it's good or bad I don't know)
GPFS is a worthy consideration. It is IBM proprietary, but does have very good scalability, is a full-fledged network file system, and has decent Windows support. NTFS ACLs are preserved, though mapping them to NFSv4 ACLs, which works quite well (so long as you don't shoot your foot off trying to use POSIX permissions as well; chmod will blow away your NFSv4 ACLs.)
Lustre is worth a mention, but Windows support is generally considered very poor and green.
You might want to check out CloudIQ Storage from Appistry.
(They have closed shop.)
It allows you to take the drives in commodity based machines (linux or windows) and have them appear as a single namespace accessible via a REST based API. When you write files to the system, you can define the number of copies you want saved. So if you had 5 machines in your distributed system, you could specify that a file be saved on 2 or 3 (or N) machines for redundancy. If a machine/hard drive crashes, its not an issue, because other machines hold copies of those files.
Check out the Downloads and Community links for a trial version as well as documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With