Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the redirection mean in apache beam (python)

In apache beam python sdk , I often see '>>' operator in pipeline procedure.

https://beam.apache.org/documentation/programming-guide/#pipeline-io

lines = p | 'ReadFromText' >> beam.io.ReadFromText('path/to/input-*.csv')

What does this mean?

like image 720
Yu Watanabe Avatar asked May 24 '18 23:05

Yu Watanabe


People also ask

What does PCollection stand for?

PCollection - A PCollection is a data set or data stream. The data that a pipeline processes is part of a PCollection. PTransform - A PTransform (or transform) represents a data processing operation, or a step, in your pipeline.

How do I know if PCollection is empty?

There is no way to check size of the PCollection without applying a PTransform on it (such as Count. globally() or Combine.

What is PCollection in Apache Beam?

A PCollection<T> is an immutable collection of values of type T . A PCollection can contain either a bounded or unbounded number of elements.

How does Apache Beam work?

Apache Beam transforms use PCollection objects as inputs and outputs for each step in your pipeline. A PCollection can hold a dataset of a fixed size or an unbounded dataset from a continuously updating data source. A transform represents a processing operation that transforms data.

What is beam in Apache Beam?

Apache Beam is an open-source, unified model that allows users to build a program by using one of the open-source Beam SDKs (Python is one of them) to define data processing pipelines. The pipeline is then translated by Beam Pipeline Runners to be executed by distributed processing backends, such as Google Cloud Dataflow.

How do I create a temporary redirect in Apache?

You can create a temporary redirect in Apache by adding a line like this to the virtual host entry in the server configuration file: This guide will cover a more in depth explanation of how to implement each kind of redirect in Apache, and go through some examples for specific use cases.

What are the different types of redirects?

There are a few different kinds of redirects, each of which mean something different to the client browser. The two most common types are temporary redirects and permanent redirects. Temporary redirects (response status code 302 Found) are useful if a URL temporarily needs to be served from a different location.

What version of Python does beam run on?

At the date of this article Apache Beam (2.8.1) is only compatible with Python 2.7, however a Python 3 version should be available soon. If you have python-snappy installed, Beam may crash. This issue is known and will be fixed in Beam 2.9.


1 Answers

>> is the right bitwise shift operator in Python. The equivalent dunder (double underscore) method is __rrshift__().

The implementation of Apache Beam in Python simply redefines __rrshift__() for the PTransform class so that names can be added to the transform. It's just special syntax. In your example, "ReadFromText" is the name of the transform.

Reference: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L445

like image 85
Andrew Nguonly Avatar answered Nov 15 '22 07:11

Andrew Nguonly