Broadcast A User Defined Class In Spark
I am trying to broadcast a user defined variable in a PySpark application but I always have the following error: File '/usr/local/spark-2.1.0-bin-hadoop2.7/python/lib/pyspark.zip/
Solution 1:
Placing the FooMap
class in a separate module, everything works fine.
Solution 2:
This is an addition to the previous answer.
You should import FooMap
from another file not just define it in current file
maybe like this: in foo_map.py:
classFooMap(object):
def__init__(self):
keys = list(range(10))
values = [2 * key for key in keys]
self._map = dict(zip(keys, values))
defmap(self, value):
if value notin self._map:
return -1return self._map[value]
then in sparkbrad.py
from foo_map import FooMap
classFooMapJob(object):
def__init__(self, inputs):
self._inputs = inputs
self._foomap = FooMap()
defrun(self):
sc = spark.SparkContext('local', 'FooMap')
input_ = sc.parallelize(self._inputs, 4)
b = sc.broadcast(self._foomap)
output = input_.map(lambda item: b.value.map(item))
b.unpersist()
result = list(output.toLocalIterator())
sc.stop()
return result
defmain():
inputs = [random.randint(0, 10) for _ inrange(10)]
job = FooMapJob(inputs)
print(job.run())
if __name__ == '__main__':
main()
Post a Comment for "Broadcast A User Defined Class In Spark"