Table of Contents

Introduction

Looking for a simple and efficient way to extract text from PowerPoint presentations programmatically in C#? In this article, we will explore how to achieve this using the PresentationTextExtractor class from the Slidize.Plugins library.

The PresentationTextExtractor class is a plugin within the Slidize.Plugins library designed to extract text from the PowerPoint 97-2003 and Microsoft Office Open XML presentations.

How to Extract Raw Text from Presentation

Let's start with a simple example to extract the raw text from a PowerPoint presentation. This method takes the input presentation and the extraction mode and extracts text in the selected mode.

using Slidize;

SlideText[] rawSlidesText = PresentationTextExtractor.Process("presentation.pptx", TextExtractionMode.Unarranged);
foreach (var slideText in rawSlidesText)
{
     // Print the text extracted from the slide
     Console.WriteLine(slideText.Text);

     // Print the text extracted from the master of the slide
     Console.WriteLine(slideText.MasterText);

     // Print the text extracted from the layout of the slide
     Console.WriteLine(slideText.LayoutText);

     // Print the notes text extracted from the slide
     Console.WriteLine(slideText.NotesText);

     // Print the comments text extracted from the slide
     Console.WriteLine(slideText.CommentsText);
}

In this example, we specify the path to the input presentation (presentation.pptx) and the TextExtractionMode.Unarranged extraction mode. The result of execution is an array of text information slide by slide with no respect to its position on the presentation slide.

How to Extract Arranged Text from Presentation

In some cases, it is necessary to extract text with position in the same order as on the presentation slide. The PresentationTextExtractor plugin offers the TextExtractionMode.Arranged mode for these purposes.

using Slidize;

SlideText[] arrangedSlidesText = PresentationTextExtractor.Process("presentation.pptx", TextExtractionMode.Arranged);
using (var outputFile = new System.IO.StreamWriter("presentation-text.txt", false))
{
    foreach (var slideText in arrangedSlidesText)
    {
        // Save the arranged text extracted from the slide
        outputFile.WriteLine(slideText.Text);
    }
}

In this example, we specify the path to the input presentation (presentation.pptx) and the TextExtractionMode.Arranged extraction mode. The extracted text is saved to the text file (presentation-text.txt).