Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Power BI: How to use Python with multiple tables in the Power Query Editor?

Tags:

python

powerbi

How can you create a new table with a Python script that uses two existing tables as input? For example by performing a left join using pandas merge?

Some details:

Using Home > Edit queries you can utilize Python under Transform > Run Python Script. This opens a Run Python Script dialog box where your're told that '#dataset' holds the input data for this script. And you'll find the same phrase if you just click OK and look at the formula bar:

= Python.Execute("# 'dataset' holds the input data for this script#(lf)",[dataset=#"Changed Type"])

This also adds a new step under Applied Steps called Run Python script where you can edit the Python script by clicking the gear symbol on the right:

enter image description here

How can you change that setup to reference more than one table?


Sample data

Here are two tables that can be stored as CSV files and loaded using Home > Get Data > Text/CSV

Table1

Date,Value1
2108-10-12,1
2108-10-13,2
2108-10-14,3
2108-10-15,4
2108-10-16,5

Table2

Date,Value2
2108-10-12,10
2108-10-13,11
2108-10-14,12
2108-10-15,13
2108-10-16,14
like image 654
vestland Avatar asked Aug 21 '18 11:08

vestland


People also ask

How do I run a Python script in power query?

Open Power Query Editor by selecting Transform data from the Home tab in Power BI Desktop. In the Transform tab, select Run Python Script and the Run Python Script editor appears (shown in the next step).

Can Python connect to Power BI dataset?

Open Power BI Desktop and click on Get Data. Select Other and then select Python Script from the list of data sources. Click on Connect. A new dialog box for writing the Python script will open.

How do you query two tables in Power BI?

In Power BI Desktop you can join two tables with Merge menu item in the Query Editor, in Home tab, Under Combine, Merge Queries. The Merge Window will appear with ability to select first table (Left part of the join), and the second table (Right part of the join).


2 Answers

This is the same challenge that has been described for R scripts here. That setup should work for Python too. However, I've found that that approach has one drawback: It stores the new joined or calculated table as an edited version of one of the previous tables. The following suggestion will demonstrate how you can produce a completely new calculated table without altering the input tables (except changing the data type of the Date columns from Date to Text because of this.)

Short answer:

In the Power Query editor, follow these steps:

  1. Change the data type of the Date columns in both columns to Text.

  2. Click Enter Data. Only click OK.

  3. Activate the new Table3 and use Transform > Run Python Script. Only click OK.

  4. Activate the formula bar and replace what's in it with = Python.Execute("# Python:",[df1=Table1, df2=Table2]). Click Enter.

  5. If you're prompted to do so, click Edit Permission and Run in the next step.

  6. Under Applied Steps, in the new step named Run Python Script, click the gear icon to open the Run Python Script editor.

  7. Insert the snippet below and click OK.

Code:

import pandas as pd
df3 = pd.merge(df1, df2, how = 'left', on = ['Date'])
df3['Value3'] = df1['Value1']*df2['Value2']

Next to df3, click Table, and that's it:

enter image description here

The details:

The list above will have to be followed very carefully to get things working. So here are all of the dirty little details:

1. Load the tables as CSV files in Power BI Desktop using Get Data.

2. Click Edit Queries.

3. In Table1, Click the symbol next to the Date column, select Text and click Replace Current

enter image description here

4. Do the same for Table2

5. On the Home tab, click Enter Data

6. In the appearing box, do nothing else than clicking OK.

enter image description here

7. This will insert an empty table named Table3 under Queries, and that's exactly what we want:

enter image description here

8. Go to the Transform tab and click Run Python Script:

enter image description here

9. This opens the Run Python Script editor. And you can start writing you scripts right here, but that will make things unnecessarily complicated in the next steps. So do nothing but click OK:

enter image description here

10. In the formula bar you will se the formula = Python.Execute("# 'dataset' holds the input data for this script#(lf)",[dataset=#"Changed Type"]). And notice that you've got a new step under Applied Steps named Run Python Script:

enter image description here

11. There are several interesting details in the screenshot above, but first we're going to break down the arguments of the function = Python.Execute("# 'dataset' holds the input data for this script#(lf)",[dataset=#"Changed Type"]).

The part "# 'dataset'" holds the input data for this script#(lf)" simply inserts the comment that you can see in the Python Script Editor. So it's not important, but you can't just leave it blank either. I like to use something shorter like "# Python:".

The part [dataset=#"Changed Type"] is a pointer to the empty Table3 in the state that it is under Changed Type. So if the last thing that you do before inserting a Python Script is something else than changing data types, this part will look different. The table is then made available in your python script using dataset as a pandas data frame. With this in mind, we can make som very useful changes to the formula:

12. Change the formula bar to = Python.Execute("# Python:",[df1=Table1, df2=Table2]) and hit Enter. This will make Table1 and Table2 available for your Python scripts as two pandas dataframes named df1 and df2, respectively.

13. Click the gear (or is it a flower?) icon next to Run Python script under Applied Steps:

enter image description here

14. Insert the following snippet:

Code:

import pandas as pd
df3 = pd.merge(df1, df2, how = 'left', on = ['Date'])
df3['Value3'] = df1['Value1']*df2['Value2']

This will join df1 and df2 on the Date column, and insert a new calculated column named Value3. Not too fancy, but with this setup you can do anything you want with your data in the world of Power BI and with the power of Python.

enter image description here

15. Click OK and you'll se this:

enter image description here

You'll see df3 listed under the input dataframes df1 and df2 in the blue square. If you've assigned any other dataframes as a step in your calculations in the Python script, they will be listed here too. In order to turn it into an accessible table for Power BI, just click Table as indicated by the green arrow.

16. And that's it:

enter image description here

Note that the data type of the Date column is set to Date by default, but you can change that to Text as explained earlier.

Click Home > Close&Apply to exit the Power Query Editor and go back to where it all started in Power BI Desktop.

like image 93
vestland Avatar answered Oct 05 '22 13:10

vestland


You can create a Blank Query and replace its contents in Advanced Editor:

let
    Source = Python.Execute("# Python:#(lf)",[df1=Table1, df2=Table2])
in
    Source

After you can click on the gear by the Source to add your python code where you can use df1 and df2.

like image 21
Vitalijs Arkulinskis Avatar answered Oct 05 '22 13:10

Vitalijs Arkulinskis