Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

From Excel to list of tuples

Tags:

python

excel

I have an Excel (.xlsx) file that has two columns of phrases. For example:

John  I have a dog     
Mike  I need a cat
Nick  I go to school

I want to import it in Python and to get a list of tuples like:

[('John', 'I have a dog'), ('Mike', 'I need a cat'), ('Nick', 'I go to school'), ...]

What could I do?

like image 281
Gigi Russo Avatar asked May 08 '20 09:05

Gigi Russo


5 Answers

You can read the excel file using pd.read_excel. You need to care about the header is there are some or not.

As you said, it returns a dataframe. In my case, I have the following.

df = pd.read_excel("data.xlsx")
print(df)
#         name         message
# 0       John    I have a dog
# 1       Mike    I need a cat
# 2       Nick  I go to school

Then, it's possible to have the values of the dataframe using to_numpy. It return a numpy array.

If you want a list, we use the numpy method tolist to convert it as list:

out = df.to_numpy().tolist()
print(out)
# [['John', 'I have a dog'],
#  ['Mike', 'I need a cat'],
#  ['Nick', 'I go to school']]

As you can see, the output is a list of list. If you want a list of tuples, just cast them:

# for getting list of tuples
out = [tuple(elt) for elt in out]
print(out)
# [('John', 'I have a dog'), 
#  ('Mike', 'I need a cat'), 
#  ('Nick', 'I go to school')]

Note: An older solution was to call values instead of to_numpy(). However, the documentation clearly recommends using to_numpy and forgive values.

Hope that helps !

like image 129
Alexandre B. Avatar answered Oct 11 '22 01:10

Alexandre B.


import pandas as pd    
file_path = r'filepath.xlsx'
xlsx = pd.read_excel(file_path)
names = xlsx.names    
scores = xlsx.scores    
my_list = [(name, score) for name in names for score in scores]   
print(my_list)

You need to modify file_path, name and score. In addition, if you have not imported pandas before, then you need to execute pip install pandas in the terminal first

like image 35
熊水斌 Avatar answered Oct 11 '22 01:10

熊水斌


You can use openpyxl:

import openpyxl

wb = openpyxl.load_workbook('test.xlsx')

ws = wb.active
cells = ws['A1:B3']

l = []
for c1, c2 in cells:
    l.append((c1.value, c2.value))

print(l)
like image 26
Andrea Baldini Avatar answered Oct 11 '22 01:10

Andrea Baldini


So, you can use the pandas data frames to read and work with excel files very easily. The below solution will actually result in a list of lists. I hope it helps anyway. First response on StackOverflow and also I am not the most experienced programmer. ^^

df = pd.read_excel (r'PathOfExcelFile.xlsx')
print (df)
mylist = [df.columns.values.tolist()] + df.values.tolist()
print (mylist)

https://datatofish.com/read_excel/

https://datatofish.com/convert-pandas-dataframe-to-list/

like image 42
Waynaeri Avatar answered Oct 11 '22 02:10

Waynaeri


You need to install and import pandas and need to install xlrd module

pip install pandas
pip install xlrd

then

import pandas as pd

df = pd.read_excel("dataset.xlsx", header=None)   #header=None means no header
mylist = list(map(tuple, df.to_numpy()))
#output
#[('John', '  I have a dog     '), ('Mike ', ' I need a cat'), ('Nick ', ' I go to school')]

Explanation:

.read_excel will read the excel into pandas dataframe,

df = pd.read_excel("filename.xlsx", header = None)

#        0                1
# 0  John        I have a dog
# 1  Mike        I need a cat
# 2  Nick        I go to school

Use None for the header parameter if there is no header. header=None

If header exists,

df = pd.read_excel("filename.xlsx")

#     Name        Status     <-headers
# 0  John     I have a dog
# 1  Mike     I need a cat
# 2  Nick     I go to school

to_numpy() Convert the DataFrame to a NumPy array. Using map the item is sent to the function(tuple()) as a parameter to convert each set of rows to tuples.

mylist = list(map(tuple, df.to_numpy()))

Refer: pandas.read_excel, map

like image 4
Avishka Dambawinna Avatar answered Oct 11 '22 02:10

Avishka Dambawinna