unoffice

Reclaim text from office documents
git clone https://logand.com/git/unoffice.git/
Log | Files | Refs | README

unodt (288B)


      1 #!/usr/bin/env bash
      2 #set -euo pipefail
      3 unzip -p "$1" \
      4     | grep -a '<text:p' \
      5     | sed 's/<text:p[^<\/]*>/\n/g' \
      6     | sed 's/<[^<]*>//g' \
      7     | sed 's/&lt;/</g' \
      8     | sed 's/&gt;/>/g' \
      9     | sed "s/&apos;/'/g" \
     10     | sed 's/&quot;/"/g' \
     11     | sed 's/&amp;/&/g' \
     12     | cat -s