Skip to content Skip to sidebar Skip to footer

How To List Down All The Dataflow Jobs Using Python Api

My use case involves fetching the job id of all streaming dataflow jobs present in my project and cancel it. Update the sources for my dataflow job and re-run it. I am trying to ac

Solution 1:

You can use directly the Dataflow rest api like this

from google.auth.transport.requests import AuthorizedSession
    import google.auth

    base_url = 'https://dataflow.googleapis.com/v1b3/projects/'

    credentials, project_id = google.auth.default(scopes=['https://www.googleapis.com/auth/cloud-platform'])
    project_id = 'PROJECT_ID'
    location = 'europe-west1'
    authed_session = AuthorizedSession(credentials)
    response = authed_session.request('GET', f'{base_url}{project_id}/locations/{location}/jobs')
    print(response.json())

You have to import the google-auth dependency.

You can also add the query parameter ?filter=ACTIVE to get only the active dataflow, that can match with your streaming jobs.

Solution 2:

In addition to using the rest API directly, you can use the generated Python bindings for the API in google-api-python-client. For simple calls it doesn't add that much value, but when passing in many parameters it can be easier to work with than a raw HTTP library.

With that library, the jobs list call would look like

from googleapiclient.discovery import build
import google.auth
credentials, project_id = google.auth.default(scopes=['https://www.googleapis.com/auth/cloud-platform'])
df_service = build('dataflow', 'v1b3', credentials=credentials)
response = df_service.projects().locations().jobs().list(
  project_id=project_id,
  location='<region>').execute()

Post a Comment for "How To List Down All The Dataflow Jobs Using Python Api"