What Is The Right Way To Save\load Models In Spark\pyspark
Solution 1:
One way to save a model (in Scala; but probably is similar in Python):
// persist model to HDFS
sc.parallelize(Seq(model), 1).saveAsObjectFile("linReg.model")
Saved model can then be loaded as:
vallinRegModel= sc.objectFile[LinearRegressionModel]("linReg.model").first()
See also related question
For more details see (ref)
Solution 2:
As of this pull request merged on Mar 28, 2015 (a day after your question was last edited) this issue has been resolved.
You just need to clone/fetch the latest version from GitHub (git clone git://github.com/apache/spark.git -b branch-1.3
) then build it (following the instructions in spark/README.md
) with $ mvn -DskipTests clean package
.
Note: I ran into trouble building Spark because Maven was being wonky. I resolved that issue by using $ update-alternatives --config mvn
and selecting the 'path' that had Priority: 150, whatever that means. Explanation here.
Solution 3:
I run into this also -- it looks like a bug. I have reported to spark jira.
Solution 4:
Use pipeline in ML to train the model, and then use MLWriter and MLReader to save models and read them back.
from pyspark.mlimportPipelinefrom pyspark.mlimportPipelineModel
pipeTrain.write().overwrite().save(outpath)
model_in = PipelineModel.load(outpath)
Post a Comment for "What Is The Right Way To Save\load Models In Spark\pyspark"