i am unable to use the predefined pipeline "recognize_entities_dl" provided by the spark-nlp library
i tried installing different versions of pyspark and spark-nlp library
import sparknlp
from sparknlp.pretrained import PretrainedPipeline
#create or get Spark Session
spark = sparknlp.start()
sparknlp.version()
spark.version
#download, load, and annotate a text by pre-trained pipeline
pipeline = PretrainedPipeline('recognize_entities_dl', lang='en')
result = pipeline.annotate('Harry Potter is a great movie')
2.1.0
recognize_entities_dl download started this may take some time.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-13-b71a0f77e93a> in <module>
11 #download, load, and annotate a text by pre-trained pipeline
12
---> 13 pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
14 result = pipeline.annotate('Harry Potter is a great movie')
d:\python36\lib\site-packages\sparknlp\pretrained.py in __init__(self, name, lang, remote_loc)
89
90 def __init__(self, name, lang='en', remote_loc=None):
---> 91 self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
92 self.light_model = LightPipeline(self.model)
93
d:\python36\lib\site-packages\sparknlp\pretrained.py in downloadPipeline(name, language, remote_loc)
50 def downloadPipeline(name, language, remote_loc=None):
51 print(name + " download started this may take some time.")
---> 52 file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
53 if file_size == "-1":
54 print("Can not find the model to download please check the name!")
AttributeError: module 'sparknlp.internal' has no attribute '_GetResourceSize'
Thanks for confirming your Apache Spark version. The pre-trained pipelines and models are based on Apache Spark and Spark NLP versions. The lowest Apache Spark version must be 2.4.x
to be able to download the pre-trained models/pipelines. Otherwise, you need to train your own models/pipelines for any version before.
This is the list of all pipelines and they all for Apache Spark 2.4.x: https://nlp.johnsnowlabs.com/docs/en/pipelines
If you take a look at the URL of any models or pipelines you can see this information:
recognize_entities_dl_en_2.1.0_2.4_1562946909722.zip
recognize_entities_dl
en
2.1.0
or greater2.4.x
or greaterNOTE: The Spark NLP library is being built and compiled against Apache Spark 2.4.x
. That is why models and pipelines are being only available for the 2.4.x
version.
NOTE 2: Since you are using Windows, you need to use _noncontrib
models and pipelines which are compatible with Windows: Do Spark-NLP pretrained pipelines only work on linux systems?
I hope this answer helps and solves your issue.
UPDATE April 2020: Apparently the models and pipelines trained and uploaded on Apache Spark 2.4.x are compatible with Apache Spark 2.3.x as well. So if you are on Apache Spark 2.3.x even though you cannot use pretrained()
for auto-download you can download it manually and just use .load()
instead.
Full list of all models and pipelines with links to download: https://github.com/JohnSnowLabs/spark-nlp-models
Update: After 2.4.0 release, all the models and pipelines are cross-platform and there is no need to choose a different model/pipeline for any specific OS: https://github.com/JohnSnowLabs/spark-nlp/releases/tag/2.4.0
For newer releases: https://github.com/JohnSnowLabs/spark-nlp/releases
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With