Pyspark - converting json string to DataFrame

Tags:

I have a test2.json file that contains simple json:

{  "Name": "something",  "Url": "https://stackoverflow.com",  "Author": "jangcy",  "BlogEntries": 100,  "Caller": "jangcy"}

I have uploaded my file to blob storage and I create a DataFrame from it:

df = spark.read.json("/example/data/test2.json")

then I can see it without any problems:

df.show()
+------+-----------+------+---------+--------------------+
|Author|BlogEntries|Caller|     Name|                 Url|
+------+-----------+------+---------+--------------------+
|jangcy|        100|jangcy|something|https://stackover...|
+------+-----------+------+---------+--------------------+

Second scenario: I have really the same json string declared within my notebook:

newJson = '{  "Name": "something",  "Url": "https://stackoverflow.com",  "Author": "jangcy",  "BlogEntries": 100,  "Caller": "jangcy"}'

I can print it etc. But now if I'd like to create a DataFrame from it:

df = spark.read.json(newJson)

I get the 'Relative path in absolute URI' error:

'java.net.URISyntaxException: Relative path in absolute URI: {  "Name":%20%22something%22,%20%20%22Url%22:%20%22https:/stackoverflow.com%22,%20%20%22Author%22:%20%22jangcy%22,%20%20%22BlogEntries%22:%20100,%20%20%22Caller%22:%20%22jangcy%22%7D'
Traceback (most recent call last):
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/readwriter.py", line 249, in json
    return self._df(self._jreader.json(self._spark._sc._jvm.PythonUtils.toSeq(path)))
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: 'java.net.URISyntaxException: Relative path in absolute URI: {  "Name":%20%22something%22,%20%20%22Url%22:%20%22https:/stackoverflow.com%22,%20%20%22Author%22:%20%22jangcy%22,%20%20%22BlogEntries%22:%20100,%20%20%22Caller%22:%20%22jangcy%22%7D'

Should I apply additional transformations to the newJson string? If yes, what should them be? Please forgive me, if this is too trivial, as I am very new to Python and Spark.

I am using Jupyter notebook with PySpark3 Kernel.

Thanks in advance.

598

asked Apr 05 '18 15:04

Jangcy

1 Answers

You can do the following

newJson = '{"Name":"something","Url":"https://stackoverflow.com","Author":"jangcy","BlogEntries":100,"Caller":"jangcy"}'
df = spark.read.json(sc.parallelize([newJson]))
df.show(truncate=False)

which should give

+------+-----------+------+---------+-------------------------+
|Author|BlogEntries|Caller|Name     |Url                      |
+------+-----------+------+---------+-------------------------+
|jangcy|100        |jangcy|something|https://stackoverflow.com|
+------+-----------+------+---------+-------------------------+

answered Oct 27 '22 10:10

Ramesh Maharjan

Related questions
                            
                                How to take draw an average line for a scatter / a plot in MatPlotLib?
                            
                                Given a .torrent file how do I generate a magnet link in python? [closed]
                            
                                Is this Python code vulnerable to SQL injection? (SQLite3)
                            
                                syntaxError: 'continue' not properly in loop
                            
                                Better binning in pandas [duplicate]
                            
                                QThread: Destroyed while thread is still running
                            
                                django - How to set default value for DecimalField in django 1.3?
                            
                                fastest way to search python dict with partial keyword
                            
                                how to skip blank line while reading CSV file using python
                            
                                Convert images to webP using Pillow
                            
                                Scrapy: Passing item between methods
                            
                                Scapy installation fails on osx with dnet import error
                            
                                Asserting column(s) data type in Pandas
                            
                                How do I use Psycopg2's LoggingConnection?
                            
                                Using foreign keys in sqlite3 for Python
                            
                                How to add a favicon to a Pelican blog?
                            
                                How to load an existing ipython notebook?
                            
                                Running python script in Laravel
                            
                                How to verify SqlAlchemy engine object
                            
                                python dictionary: How to get all keys with specific values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pyspark - converting json string to DataFrame

Tags:

python

jupyter-notebook

apache-spark

pyspark

Jangcy

People also ask

1 Answers

Ramesh Maharjan

Recent Activity

Donate For Us