Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automating a Job at Work: Importing Powerpoint Bullet Text into an Excel Sheet

I have been asked to automate a particular task at work today which takes up a lot of our time! The following is what needs to be done and I would appreciate any help on how I can do this (Implementation Advice) within the realms of my knowledge, if possible.

Problem

I have a PowerPoint document (.ppt). I would like to extract text from there (the text is in bullet point format). I would like to insert these bullets points into an Excel sheet, each bullet point should be a row. I would also like to put in the adjacent column the page this bullet point text was taken from.

So, basically: Extract from ppt --> Insert into Excel sheet each row being a bullet point.

Technologies available to me

Perl, PHP and Java.

I would prefer PHP to be honest as this is my primary language, but I am happy to consider anything else you guys/gals think is best. Second would be Perl and then Java. I do not want to be compiling classes and installing the JDK just for this! :)

Key Questions

  • How do you reference a bullet point?
  • Am I likely to end up with just a load of unstructured text in the Excel sheet?
  • Are there any barriers to reading from a ppt file?

Update

I would consider MS technologies (VB, etc.) if it makes life easier but I have never used it and I despise MS technology! Hope I don't get flamed by the evangelists! :)

like image 794
Abs Avatar asked Nov 28 '22 19:11

Abs


2 Answers

It can be done with Perl. Pretty much anything you can do with VBA can be done with Perl via Win32::OLE. I've used the Win32::OLE module to work with MS-Office documents, both extracting and creating content. It's been awhile though. Start here, http://win32.perl.org/wiki/index.php?title=Win32_Perl_Modules about the middle of the page.

The VBA documentation on each of the objects is useful for reference, finding what objects there are and the methods and properties available on them.

like image 156
Rob K Avatar answered Dec 05 '22 11:12

Rob K


Here is a sample script using Win32::OLE.

By the way, once you have converted the slides into a format you can process, you can use Spreadsheet::WriteExcel on non-MS systems to write the output. Therefore, I would recommend two programs: One to transform the PowerPoint documents and another to generate the Excel files.

Note that an excellent source of information for Microsoft Office applications is the Object Browser. You can access it via Tools → Macro → Visual Basic Editor. Once you are in the editor, hit F2 to browse the interfaces, methods, and properties provided by Microsoft Office applications.

#!/usr/bin/perl

use strict;
use warnings;

use FindBin qw( $Bin );
use File::Spec::Functions qw( catfile );

use Win32::OLE;
use Win32::OLE::Const 'Microsoft PowerPoint';
$Win32::OLE::Warn = 3;

my $ppt = get_ppt();
$ppt->{Visible} = 1;

my $ppt_file = catfile $Bin, 'test.ppt';
my $doc = $ppt->Presentations->open( $ppt_file );
my $slides = $doc->Slides;
my $num_slides = $slides->Count;

for my $slide_idx (1 .. $num_slides) {
    print "=== Begin Slide $slide_idx ===\n";

    my $slide = $doc->Slides->Item( $slide_idx );
    my $shapes = $slide->Shapes;
    my $num_shapes = $shapes->Count;

    for my $shape_idx (1 .. $num_shapes) {
        my $shape = $shapes->Item($shape_idx);
        next unless $shape->HasTextFrame;

        my $pars = $shape->TextFrame->TextRange->Paragraphs;
        my $num_pars = $pars->Count;
        for my $par_idx (1 .. $num_pars) {
            my $par = $pars->Paragraphs($par_idx,1);
            print_par( $par );
        }
    }

    print "=== End Slide $slide_idx ===\n\n";
}

sub print_par {
    my ($par) = @_;
    my @bullets = qw( - * > + = @ );

    my $bullet_format = $par->ParagraphFormat->Bullet;
    my $bullet_type = $bullet_format->Type;

    my $bullet_char = '';

    if ($bullet_type == ppBulletNumbered) {
        $bullet_char = $bullet_format->Number . "\t";
    }
    elsif( $bullet_type == ppBulletUnnumbered ) {
        # Need a Unicode => ASCII mapping if you want to use
        # $bullet_format->Character
        my $indent = $par->IndentLevel % scalar @bullets;
        $bullet_char = $bullets[$indent] . "\t";
    }

    my $text = $par->Text;
    $text =~ s/\s+$//;

    print $bullet_char, $text,"\n";
}

sub get_ppt {
    my $app;
    eval {
        $app = Win32::OLE->GetActiveObject('PowerPoint.Application');
    };

    die "$@\n" if $@;

    unless($app) {
        $app = Win32::OLE->new(
            'PowerPoint.Application', sub { $_[0]->Quit }
        ) or die "Oops, cannot start PowerPoint: ",
                 Win32::OLE->LastError, "\n";
    }
    return $app;
}
like image 39
Sinan Ünür Avatar answered Dec 05 '22 10:12

Sinan Ünür