I intend to build a RESTful service which will return a custom text format. Given my very large volumes of data, XML/JSON is too verbose. I'm looking for a row based text format. CSV is an obvious candidate. I'm however wondering if there isn't something better out there. The only I've found through a bit of research is CTX and Fielded Text. I'm looking for a format which offers the following: <ul> <li>Plain text, easy to read</li> <li>very easy to parse by most software platforms</li> <li>column definition can change without requiring changes in software clients</li> </ul> Fielded text is looking pretty good and I could definitely build a specification myself, but I'm curious to know what others have done given that this must be a very old problem. It's surprising that there isn't a better standard out there. What suggestions do you have?

Looking through the existing answers, most struck me as a bit dated. Especially in terms of 'big data', noteworthy alternatives to CSV include: <ul> <li> ORC : 'Optimised Row Columnar' uses row storage, useful in Python/Pandas. Originated in HIVE, optimised by Hortonworks. Schema is in the footer. The Wikipedia entry is currently quite terse https://en.wikipedia.org/wiki/Apache_ORC but Apache has a lot of detail. </li> <li> Parquet : Similarly column-based, with similar compression. Often used with Cloudera Impala. </li> <li> Avro : from Apache Hadoop. Row-based, but uses a Json schema. Less capable support in Pandas. Often found in Apache Kafka clusters. </li> </ul> All are splittable, all are inscrutable to people, all describe their content with a schema, and all work with Hadoop. The column-based formats are considered best where cumulated data are read often; with multiple writes, Avro may be more suited. See e.g. https://www.datanami.com/2018/05/16/big-data-file-formats-demystified/ Compression of the column formats can use SNAPPY (faster) or GZIP (slower but more compression). You may also want to look into Protocol Buffers, Pickle (Python-specific) and Feather (for fast communication between Python and R).

Alternative to CSV?

Tags:

rest

csv

plaintext

I intend to build a RESTful service which will return a custom text format. Given my very large volumes of data, XML/JSON is too verbose. I'm looking for a row based text format.

CSV is an obvious candidate. I'm however wondering if there isn't something better out there. The only I've found through a bit of research is CTX and Fielded Text.

I'm looking for a format which offers the following:

Plain text, easy to read
very easy to parse by most software platforms
column definition can change without requiring changes in software clients

Fielded text is looking pretty good and I could definitely build a specification myself, but I'm curious to know what others have done given that this must be a very old problem. It's surprising that there isn't a better standard out there.

What suggestions do you have?

485

asked Oct 06 '10 16:10

srmark

1 Answers

Looking through the existing answers, most struck me as a bit dated. Especially in terms of 'big data', noteworthy alternatives to CSV include:

ORC : 'Optimised Row Columnar' uses row storage, useful in Python/Pandas. Originated in HIVE, optimised by Hortonworks. Schema is in the footer. The Wikipedia entry is currently quite terse https://en.wikipedia.org/wiki/Apache_ORC but Apache has a lot of detail.
Parquet : Similarly column-based, with similar compression. Often used with Cloudera Impala.
Avro : from Apache Hadoop. Row-based, but uses a Json schema. Less capable support in Pandas. Often found in Apache Kafka clusters.

All are splittable, all are inscrutable to people, all describe their content with a schema, and all work with Hadoop. The column-based formats are considered best where cumulated data are read often; with multiple writes, Avro may be more suited. See e.g. https://www.datanami.com/2018/05/16/big-data-file-formats-demystified/

Compression of the column formats can use SNAPPY (faster) or GZIP (slower but more compression).

You may also want to look into Protocol Buffers, Pickle (Python-specific) and Feather (for fast communication between Python and R).

answered Sep 29 '22 06:09

Jo van Schalkwyk

Related questions
                            
                                Delete Ajax works in localhost - but doesn't in production hosting
                            
                                Spring Boot request header return null value
                            
                                403 Response code - Request Blocked when using Cowin setu APIs
                            
                                sending array via query string in guzzle
                            
                                Enabling CORS for Cowboy REST API
                            
                                ASP.NET MVC call REST Service from server side [closed]
                            
                                JAX-RS: How to extend Application class to scan packages?
                            
                                React + Redux with a rest api?
                            
                                Passing username and password in HTTP GET query parameters
                            
                                Spring Boot controller not responding to POST request
                            
                                How do you implement resource "edit" forms in a RESTful way?
                            
                                ASP.NET Development Server - logfile location
                            
                                Why prefer REST over SOAP?
                            
                                Move resource in RESTful architecture
                            
                                Rest Web services returning a 404
                            
                                TeamCity REST API get list of pending changes
                            
                                How to model a CANCEL action in a RESTful way?
                            
                                How to add a custom security annotation to Spring MVC controller method
                            
                                Should I RESTify my RPC calls over HTTP?
                            
                                Transaction in REST WCF service

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With