Introduction
Looking for a simple and efficient way to extract text from PowerPoint presentations programmatically in C#? In this article, we will explore how to achieve this using the PresentationTextExtractor
class from the Slidize.Plugins library.
The PresentationTextExtractor
class is a plugin within the Slidize.Plugins library designed to extract text from the PowerPoint 97-2003 and Microsoft Office Open XML presentations.
How to Extract Raw Text from Presentation
Let's start with a simple example to extract the raw text from a PowerPoint presentation. This method takes the input presentation and the extraction mode and extracts text in the selected mode.
using Slidize;
SlideText[] rawSlidesText = PresentationTextExtractor.Process("presentation.pptx", TextExtractionMode.Unarranged);
foreach (var slideText in rawSlidesText)
{
// Print the text extracted from the slide
Console.WriteLine(slideText.Text);
// Print the text extracted from the master of the slide
Console.WriteLine(slideText.MasterText);
// Print the text extracted from the layout of the slide
Console.WriteLine(slideText.LayoutText);
// Print the notes text extracted from the slide
Console.WriteLine(slideText.NotesText);
// Print the comments text extracted from the slide
Console.WriteLine(slideText.CommentsText);
}
In this example, we specify the path to the input presentation (presentation.pptx
) and the TextExtractionMode.Unarranged
extraction mode. The result of execution is an array of text information slide by slide with no respect to its position on the presentation slide.
How to Extract Arranged Text from Presentation
In some cases, it is necessary to extract text with position in the same order as on the presentation slide. The PresentationTextExtractor
plugin offers the TextExtractionMode.Arranged
mode for these purposes.
using Slidize;
SlideText[] arrangedSlidesText = PresentationTextExtractor.Process("presentation.pptx", TextExtractionMode.Arranged);
using (var outputFile = new System.IO.StreamWriter("presentation-text.txt", false))
{
foreach (var slideText in arrangedSlidesText)
{
// Save the arranged text extracted from the slide
outputFile.WriteLine(slideText.Text);
}
}
In this example, we specify the path to the input presentation (presentation.pptx
) and the TextExtractionMode.Arranged
extraction mode. The extracted text is saved to the text file (presentation-text.txt
).