I need to create AVRO file but for that I need 2 things:
1) JSON
2) Avro Schema
From these 2 requirements - I have JSON:
{"web-app": {
"servlet": [
{
"servlet-name": "cofaxCDS",
"servlet-class": "org.cofax.cds.CDSServlet",
"init-param": {
"configGlossary:installationAt": "Philadelphia, PA",
"configGlossary:adminEmail": "[email protected]",
"configGlossary:poweredBy": "Cofax",
"configGlossary:poweredByIcon": "/images/cofax.gif",
"configGlossary:staticPath": "/content/static",
"templateProcessorClass": "org.cofax.WysiwygTemplate",
"templateLoaderClass": "org.cofax.FilesTemplateLoader",
"templatePath": "templates",
"templateOverridePath": "",
"defaultListTemplate": "listTemplate.htm",
"defaultFileTemplate": "articleTemplate.htm",
"useJSP": false,
"jspListTemplate": "listTemplate.jsp",
"jspFileTemplate": "articleTemplate.jsp",
"cachePackageTagsTrack": 200,
"cachePackageTagsStore": 200,
"cachePackageTagsRefresh": 60,
"cacheTemplatesTrack": 100,
"cacheTemplatesStore": 50,
"cacheTemplatesRefresh": 15,
"cachePagesTrack": 200,
"cachePagesStore": 100,
"cachePagesRefresh": 10,
"cachePagesDirtyRead": 10,
"searchEngineListTemplate": "forSearchEnginesList.htm",
"searchEngineFileTemplate": "forSearchEngines.htm",
"searchEngineRobotsDb": "WEB-INF/robots.db",
"useDataStore": true,
"dataStoreClass": "org.cofax.SqlDataStore",
"redirectionClass": "org.cofax.SqlRedirection",
"dataStoreName": "cofax",
"dataStoreDriver": "com.microsoft.jdbc.sqlserver.SQLServerDriver",
"dataStoreUrl": "jdbc:microsoft:sqlserver://LOCALHOST:1433;DatabaseName=goon",
"dataStoreUser": "sa",
"dataStorePassword": "dataStoreTestQuery",
"dataStoreTestQuery": "SET NOCOUNT ON;select test='test';",
"dataStoreLogFile": "/usr/local/tomcat/logs/datastore.log",
"dataStoreInitConns": 10,
"dataStoreMaxConns": 100,
"dataStoreConnUsageLimit": 100,
"dataStoreLogLevel": "debug",
"maxUrlLength": 500}},
{
"servlet-name": "cofaxEmail",
"servlet-class": "org.cofax.cds.EmailServlet",
"init-param": {
"mailHost": "mail1",
"mailHostOverride": "mail2"}},
{
"servlet-name": "cofaxAdmin",
"servlet-class": "org.cofax.cds.AdminServlet"},
{
"servlet-name": "fileServlet",
"servlet-class": "org.cofax.cds.FileServlet"},
{
"servlet-name": "cofaxTools",
"servlet-class": "org.cofax.cms.CofaxToolsServlet",
"init-param": {
"templatePath": "toolstemplates/",
"log": 1,
"logLocation": "/usr/local/tomcat/logs/CofaxTools.log",
"logMaxSize": "",
"dataLog": 1,
"dataLogLocation": "/usr/local/tomcat/logs/dataLog.log",
"dataLogMaxSize": "",
"removePageCache": "/content/admin/remove?cache=pages&id=",
"removeTemplateCache": "/content/admin/remove?cache=templates&id=",
"fileTransferFolder": "/usr/local/tomcat/webapps/content/fileTransferFolder",
"lookInContext": 1,
"adminGroupID": 4,
"betaServer": true}}],
"servlet-mapping": {
"cofaxCDS": "/",
"cofaxEmail": "/cofaxutil/aemail/*",
"cofaxAdmin": "/admin/*",
"fileServlet": "/static/*",
"cofaxTools": "/tools/*"},
"taglib": {
"taglib-uri": "cofax.tld",
"taglib-location": "/WEB-INF/tlds/cofax.tld"}}}
But how to create AVRO Schema based on it?
Looking for programatic way to do that since will have many schemas and can not create Avro Schema manually every time.
I checked 'avro-tools-1.8.1.jar' but that can not create Avro Schema from JSON directly.
Looking for a Jar or Python code that can create JSON -> Avro schema. It is ok if Data Types are not perfect (Strings, Integers and Floats are good enough for start).
Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.
Avro has a JSON like data model, but can be represented as either JSON or in a compact binary form. It comes with a very sophisticated schema description language that describes data. We think Avro is the best choice for a number of reasons: It has a direct mapping to and from JSON.
Give this one a shot. http://www.dataedu.ca/avro
It basically infers the Avro schema that accepts the JSON.
You can even give it a JSON array. What it would do is generating an Avro schema that is compatible with all the JSON documents in your array.
There are other tools that you can verify the result.
you can use Kite SDK util to infer avro schema from a json input.
https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/spi/JsonUtil.java#L539
Example:
String json = "{\n" +
" \"id\": 1,\n" +
" \"name\": \"A green door\",\n" +
" \"price\": 12.50,\n" +
" \"tags\": [\"home\", \"green\"]\n" +
"}\n"
;
String avroSchema = JsonUtil.inferSchema(JsonUtil.parse(json), "myschema").toString();
System.out.println(avroSchema);
Result:
{
"type":"record",
"name":"myschema",
"fields":[
{
"name":"id",
"type":"int",
"doc":"Type inferred from '1'"
},
{
"name":"name",
"type":"string",
"doc":"Type inferred from '\"A green door\"'"
},
{
"name":"price",
"type":"double",
"doc":"Type inferred from '12.5'"
},
{
"name":"tags",
"type":{
"type":"array",
"items":"string"
},
"doc":"Type inferred from '[\"home\",\"green\"]'"
}
]
}
You can find the maven dependency here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With