I have an Excel (.xlsx) file that has two columns of phrases. For example:
John I have a dog
Mike I need a cat
Nick I go to school
I want to import it in Python and to get a list of tuples like:
[('John', 'I have a dog'), ('Mike', 'I need a cat'), ('Nick', 'I go to school'), ...]
What could I do?
You can read the excel file using pd.read_excel
. You need to care about the header is there are some or not.
As you said, it returns a dataframe. In my case, I have the following.
df = pd.read_excel("data.xlsx")
print(df)
# name message
# 0 John I have a dog
# 1 Mike I need a cat
# 2 Nick I go to school
Then, it's possible to have the values of the dataframe using to_numpy
. It return a numpy
array.
If you want a list, we use the numpy
method tolist
to convert it as list
:
out = df.to_numpy().tolist()
print(out)
# [['John', 'I have a dog'],
# ['Mike', 'I need a cat'],
# ['Nick', 'I go to school']]
As you can see, the output is a list of list. If you want a list of tuples, just cast them:
# for getting list of tuples
out = [tuple(elt) for elt in out]
print(out)
# [('John', 'I have a dog'),
# ('Mike', 'I need a cat'),
# ('Nick', 'I go to school')]
Note:
An older solution was to call values
instead of to_numpy()
. However, the documentation clearly recommends using to_numpy
and forgive values
.
Hope that helps !
import pandas as pd
file_path = r'filepath.xlsx'
xlsx = pd.read_excel(file_path)
names = xlsx.names
scores = xlsx.scores
my_list = [(name, score) for name in names for score in scores]
print(my_list)
You need to modify file_path, name and score. In addition, if you have not imported pandas before, then you need to execute pip install pandas in the terminal first
You can use openpyxl:
import openpyxl
wb = openpyxl.load_workbook('test.xlsx')
ws = wb.active
cells = ws['A1:B3']
l = []
for c1, c2 in cells:
l.append((c1.value, c2.value))
print(l)
So, you can use the pandas data frames to read and work with excel files very easily. The below solution will actually result in a list of lists. I hope it helps anyway. First response on StackOverflow and also I am not the most experienced programmer. ^^
df = pd.read_excel (r'PathOfExcelFile.xlsx')
print (df)
mylist = [df.columns.values.tolist()] + df.values.tolist()
print (mylist)
https://datatofish.com/read_excel/
https://datatofish.com/convert-pandas-dataframe-to-list/
You need to install and import pandas
and need to install xlrd
module
pip install pandas
pip install xlrd
then
import pandas as pd
df = pd.read_excel("dataset.xlsx", header=None) #header=None means no header
mylist = list(map(tuple, df.to_numpy()))
#output
#[('John', ' I have a dog '), ('Mike ', ' I need a cat'), ('Nick ', ' I go to school')]
Explanation:
.read_excel
will read the excel into pandas dataframe,
df = pd.read_excel("filename.xlsx", header = None)
# 0 1
# 0 John I have a dog
# 1 Mike I need a cat
# 2 Nick I go to school
Use None
for the header
parameter if there is no header. header=None
If header exists,
df = pd.read_excel("filename.xlsx")
# Name Status <-headers
# 0 John I have a dog
# 1 Mike I need a cat
# 2 Nick I go to school
to_numpy()
Convert the DataFrame to a NumPy array. Using map
the item is sent to the function(tuple()
) as a parameter to convert each set of rows to tuples.
mylist = list(map(tuple, df.to_numpy()))
Refer:
pandas.read_excel
, map
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With