Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pulling historical channel messages python

I am attempting to create a small dataset by pulling messages/responses from a slack channel I am a part of. I would like to use python to pull the data from the channel however I am having trouble figuring out my api key. I have created an app on slack but I am not sure how to find my api key. I see my client secret, signing secret, and verification token but can't find my api key

Here is a basic example of what I believe I am trying to accomplish:

import slack
sc = slack.SlackClient("api key")
sc.api_call(
  "channels.history",
  channel="C0XXXXXX"
)

I am willing to just download the data manually if that is possible as well. Any help is greatly appreciated.

like image 258
PDPDPDPD Avatar asked Jan 27 '23 03:01

PDPDPDPD


1 Answers

messages

See below for is an example code on how to pull messages from a channel in Python.

  • It uses the official Python Slack library and calls conversations_history with paging. It will therefore work with any type of channel and can fetch large amounts of messages if needed.
  • The result will be written to a file as JSON array.
  • You can specify channel and max message to be retrieved

threads

Note that the conversations.history endpoint will not return thread messages. Those have to be retrieved additionaly with one call to conversations.replies for every thread you want to retrieve messages for.

Threads can be identified in the messages for each channel by checking for the threads_ts property in the message. If it exists there is a thread attached to it. See this page for more details on how threads work.

IDs

This script will not replace IDs with names though. If you need that here are some pointers how to implement it:

  • You need to replace IDs for users, channels, bots, usergroups (if on a paid plan)
  • You can fetch the lists for users, channels and usergroups from the API with users_list, conversations_list and usergroups_list respectively, bots need to be fetched one by one with bots_info (if needed)
  • IDs occur in many places in messages:
    • user top level property
    • bot_id top level property
    • as link in any property that allows text, e.g. <@U12345678> for users or <#C1234567> for channels. Those can occur in the top level text property, but also in attachments and blocks.

Example code

import os
import slack
import json
from time import sleep

CHANNEL = "C12345678"
MESSAGES_PER_PAGE = 200
MAX_MESSAGES = 1000

# init web client
client = slack.WebClient(token=os.environ['SLACK_TOKEN'])

# get first page
page = 1
print("Retrieving page {}".format(page))
response = client.conversations_history(
    channel=CHANNEL,
    limit=MESSAGES_PER_PAGE,
)
assert response["ok"]
messages_all = response['messages']

# get additional pages if below max message and if they are any
while len(messages_all) + MESSAGES_PER_PAGE <= MAX_MESSAGES and response['has_more']:
    page += 1
    print("Retrieving page {}".format(page))
    sleep(1)   # need to wait 1 sec before next call due to rate limits
    response = client.conversations_history(
        channel=CHANNEL,
        limit=MESSAGES_PER_PAGE,
        cursor=response['response_metadata']['next_cursor']
    )
    assert response["ok"]
    messages = response['messages']
    messages_all = messages_all + messages

print(
    "Fetched a total of {} messages from channel {}".format(
        len(messages_all),
        CHANNEL
))

# write the result to a file
with open('messages.json', 'w', encoding='utf-8') as f:
  json.dump(
      messages_all, 
      f, 
      sort_keys=True, 
      indent=4, 
      ensure_ascii=False
    )
like image 69
Erik Kalkoken Avatar answered Jan 28 '23 18:01

Erik Kalkoken