If one runs DESCRIBE EXTENDED command on any hive table the result presents totalSize and rawDataSize values near the end of the output.
What do these fields mean?
Ex:
hive > DESCRIBE EXTENDED <TableName>
Output Results:
Table(tableName:TablenameXXXXX, dbName:XXxXXX,
.......... .......................
numRows=116429472, totalSize=3835205544, rawDataSize=35040221600})
rawDataSize is the size of original data set, totalSize is amount of storage it takes. It is applicable for ORC file format, as it compresses the data totalSize will be lesser than rawDataSize.
The meaning of the fields is:
The size of data is described by two statistics:
Hive on MapReduce uses totalSize. When both are available, Hive on Spark uses rawDataSize. Because of compression and serialization, a large difference between totalSize and rawDataSize can occur for the same dataset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With