OnlyText is a batch processing tool designed to extract clean, usable text from oral history transcripts. It removes title pages, front matter, headers, footers, page numbers, and other repeated page artifacts, then outputs clean UTF-8 .txt files.
How it works: OnlyText identifies the beginning of the transcript by detecting the first likely speaker label followed by a colon, such as FLEMING:, Jewell:, Malcolm Jewell:, Interviewer:, or Speaker 1:. Everything before that point is removed.
Important: This tool is designed for oral history transcripts that use speaker labels followed by a colon. It may not work correctly on documents that do not follow this convention.
Your data stays private: This tool processes transcript files entirely within your web browser — nothing is uploaded to any server and nothing is saved.
.docx, .txt, or text-based .pdf files. DOCX conversion uses Mammoth.js; PDF text extraction uses PDF.js. Scanned/image-only PDFs are not supported.Choose one processed file to preview both the original input and cleaned TXT output.