Skip to content Skip to sidebar Skip to footer

Broadcast A User Defined Class In Spark

I am trying to broadcast a user defined variable in a PySpark application but I always have the following error: File '/usr/local/spark-2.1.0-bin-hadoop2.7/python/lib/pyspark.zip/

Solution 1:

Placing the FooMap class in a separate module, everything works fine.

Solution 2:

This is an addition to the previous answer.

You should import FooMap from another file not just define it in current file

maybe like this: in foo_map.py:

classFooMap(object):

    def__init__(self):
        keys = list(range(10))
        values = [2 * key for key in keys]
        self._map = dict(zip(keys, values))

    defmap(self, value):
        if value notin self._map:
            return -1return self._map[value]

then in sparkbrad.py

from foo_map import FooMap
classFooMapJob(object):

    def__init__(self, inputs):
        self._inputs = inputs
        self._foomap = FooMap()

    defrun(self):
        sc = spark.SparkContext('local', 'FooMap')
        input_ = sc.parallelize(self._inputs, 4)
        b = sc.broadcast(self._foomap)
        output = input_.map(lambda item: b.value.map(item))
        b.unpersist()
        result = list(output.toLocalIterator())
        sc.stop()
        return result


defmain():
    inputs = [random.randint(0, 10) for _ inrange(10)]
    job = FooMapJob(inputs)
    print(job.run())

if __name__ == '__main__':
    main()

Post a Comment for "Broadcast A User Defined Class In Spark"