I have a .txt file which has the JSON format. I want to read, manipulate and restructure the file (change the fields name...) How Can I do this in Python with Apache Beam?
To be able to read a Json File with Apache Beam on Python, you can make a Custom Coder:
CF : https://beam.apache.org/documentation/programming-guide/#specifying-coders
class JsonCoder(object):
"""A JSON coder interpreting each line as a JSON string."""
def encode(self, x):
return json.dumps(x)
def decode(self, x):
return json.loads(x)
And then you have to specify it when you read or write your data, for instance :
lines = p | 'read_data' >> ReadFromText(known_args.input, coder=JsonCoder())
Best regards, work well ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With