how does netezza work? how does it compare to Hadoop?

1 Answers

How it works:
As the data is loaded into the Appliance, it intelligently separates each table across the 108 SPUs.
Typically, the hard disk is the slowest part of a computer. Imagine 108 of these spinning up at once, loading a small piece of the table. This is how Netezza achieves a 500 Gigabyte an hour load time.
After a piece of the table is loaded and stored on each SPU (computer on an integrated circuit card), each column is analyzed to gain descriptive statistics such as minimum and maximum values. These values are stored on each of the 108 SPUs, instead of indexes, which take time to create, updated and take up unnecessary space.
Imagine your environment without the need to create indexes. When it is time to query the data, a master computer inside of the Appliance queries the SPUs to see which ones contain the data required.
Only the SPUs that contain appropriate data return information, therefore less movement of information across the network to the Business Intelligence/Analytics Server. For joining data, it gets even better.
The Appliance distributes data in multiple tables across multiple SPUs by a key. Each SPU contains partial data for multiple tables. It joins parts of each table locally on each SPU returning only the local result. All of the ‘local results’ are assembled internally in the cabinet and then returned to the Business Intelligence/Analytics Server as a query result. This methodology also contributes to the speed story.
The key to all of this is ‘less movement of data across the network’. The Appliance only returns data required back to the Business Intelligence/Analytics server across the organization’s 1000/100 MB network.
This is very different from traditional processing where the Business Intelligence/Analytics software typically extracts most of the data from the database to do its processing on its own server. The database does the work to determine the data needed, returning a smaller subset result to the Business Intelligence/Analytics server.
Backup And Redundancy
To understand how the data and system are set up for almost 100% uptime, it is important to understand the internal design. It uses the outer, fastest, one-third part of each 400-Gigabyte disk for data storage and retrieval. One-third of the disk stores descriptive statistics and the other third stores hot data back up of other SPUs. Each Appliance cabinet also contains 4 additional SPUs for automatic fail over of any of the 108 SPUs.
Took from http://www2.sas.com

answered Sep 21 '22 11:09

John Crazy

Related questions
                            
                                Hadoop PIG Max of Tuple
                            
                                Controlling logging functionality in hadoop
                            
                                How to include hbase-site.xml in the classpath
                            
                                Hadoop: LongWritable cannot be cast to org.apache.hadoop.io.IntWritable
                            
                                Convert String to Text (Java Hadoop)
                            
                                FAILED: Error in semantic analysis: Column Found in more than One Tables/Subqueries
                            
                                HDFS vs LFS - How Hadoop Dist. File System is built over local file system?
                            
                                Is there a way to kill reducer task in Hadoop?
                            
                                Passing objects to MapReduce from a driver
                            
                                Found interface org.apache.hadoop.mapreduce.TaskAttemptContext
                            
                                Apache Spark - How does internal job scheduler in spark define what are users and what are pools
                            
                                Package org.apache.hadoop.conf does not exist
                            
                                Spark job did not find table in Hive database
                            
                                Hue: Failed to access filesystem root
                            
                                The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: -wx------
                            
                                foreach function not working in Spark DataFrame
                            
                                What is difference between Apache flume and Apache storm?
                            
                                java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V
                            
                                Spark csv reading speed is very slow although I increased the number of nodes
                            
                                Which is the easiest way to combine small HDFS blocks?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how does netezza work? how does it compare to Hadoop?

Tags:

hadoop

netezza

sandeepkunkunuru

People also ask

1 Answers

John Crazy

Recent Activity

Donate For Us