Skip to content Skip to sidebar Skip to footer

How To Package Vocabulary File For Cloud ML Engine

I have a .txt file which contains a different label on each line. I use this file to create a label index lookup file, for example: label_index = tf.contrib.lookup.index_table_from

Solution 1:

You have multiple options. I think the most straightforward is to store labels.txt in a GCS location.

However, if you prefer, you can also package the file up in your setup.py. There are multiple ways to do this, so I'll refer you to the official setuptools documentation.

Let me walk through a quick example:

Create a setup.py in the directory below your training package (often called trainer in CloudML Engine's samples, so I will proceed as if you're code is structured the same as the samples, including using trainer as the package). The following is based on the docs you referenced with one important change, namely, the package_data argument instead of include_package_data:

from setuptools import find_packages
from setuptools import setup

setup(
    name='my_model',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    package_data={'trainer': ['labels.txt']},
    description='My trainer application package.'
)

If you run python setup.py sdist, you can see that trainer/labels.txt was copied into the tarball.

Then in your code, you can access the file like this:

from pkg_resources import Requirement, resource_filename
resource_filename(Requirement.parse('trainer'),'labels.txt')

Note that to run this code locally, you're going to have to install your package: python setup.py install [--user].

And that's the primary reason I think storing the file on GCS might be easier.


Post a Comment for "How To Package Vocabulary File For Cloud ML Engine"