Embed arbitrary files into a PDF via "unoconv"

Discussion in 'Networking and Security' started by spy, Dec 6, 2022.

  1. spy

    spy

    I came up with this hack in order to transfer Python code to Interactive Brokers tech support. Most support reps usually just tell you to reboot your computer. But sometimes, if you've found a legitimate problem in their code, they are willing to work closely with you; esp. if it's communicated clearly. It may take forever, and they may say it won't get fixed, but at least they're aware of the problem ;-)

    As Python coders know, white-space indentation denotes code blocks. Coming from a C background there's a good reason this annoys me. With C, if your indentation gets mangled during a copy and paste (as is often the case) you can just re-indent the code with a "pretty print" program. After all, braces clearly denote code blocks in C. With Python though, if the same thing happens... you've introduced bugs into the code. There's no easy way to pretty print Python that I know of. This can be a PITA even with small code files.

    Furthermore... IB restricts the types of files you can attach to a tech support ticket. Presumably the reason for this is security, which I can understand. Nonetheless, it means that neither a .txt nor .py file can be uploaded. They do allow image and PDF files however.

    I don't exactly agree with this policy 100% since there are security vulnerabilities in image and PDF viewers too. But, we still have to live with the restrictions. So when I wanted to send them a Python file (which is basically plain text) and be sure white-space bugs wouldn't be introduced via copy/paste... I discovered "unoconv"; the universal office converter:

    "unoconv" is a command line utility that can convert any file format that LibreOffice can import, to any file format that LibreOffice is capable of exporting.

    With this we can easily and quickly create a PDF document from a simple text file.

    We still can't just create a PDF of the Python source directly. Because, copy and pasting from the PDF may screw with the leading white-space :-( We want to be assured that copy/pasting won't ruin meaningful content i.e. the leading white-space. And, without requiring OCR software, embedding text in an image isn't a viable solution either.

    Ultimately the recipient must be able to open the allowed file type (PDF), know what to do with it at a glance, and be able to copy/paste the contents as usable code. Just a little more creativity should get us there.

    To achieve this, we gzip and base64 encode the Python. This becomes our "payload". Then, we put this payload in a bash script where we can insure leading white-space is non-existent or benign. The PDF can be created from that instead.

    For example, let's say we have the following demo.py file:
    Code:
    #!/usr/bin/env python
    
    import sys
    
    print("You are using Python {}.{}.".\
          format(sys.version_info.major, sys.version_info.minor))
    if not sys.version_info.major == 3 and sys.version_info.minor >= 6:
        print("Python 3.6 or higher is required.")
        sys.exit(1)
    
    Obviously the leading white-space matters here.

    To create a PDF file that we can attach to our support ticket, we first generate a payload based off demo.py:
    Code:
    cat demo.py | gzip | base64 >  payload.sh
    The payload.sh file now contains a bunch of text, it may look like gibberish, but it's just our demo.py in a form that travels more easily through assorted tunnels.

    Now, we can edit the payload.sh to add some simple human readable instructions. Put this at the very top of payload.sh:
    Code:
    #!/usr/bin/env bash
    
    cat > /dev/null <<NOTES
    
    Hi! Copy and paste everything you see in this PDF into a .sh file.
    Then, you can run it and our Python demo code will be printed out.
    NOTES
    
    (base64 -d | zcat) < /bin/cat << PAYLOAD
    
    and place another line containing PAYLOAD at the very bottom to mark the end of the heredoc.

    Your payload.sh file should ultimately look like this:
    Code:
    #!/usr/bin/env bash
    
    cat > /dev/null <<NOTES
    
    Hi! Copy and paste everything you see in this PDF into a .sh file.
    Then, you can run it and our Python demo code will be printed out.
    NOTES
    
    (base64 -d | zcat) < /bin/cat << PAYLOAD
    H4sIANvGj2MAA3WOwQqCUBBF9+8rbrZRiCchtAjsG9oGQRg+cwJnbN5TkujfU3FXDbOae89w1qu0
    85peiVPHPdoh1MLGUNOKBvjBG9MqcYijk3Qo1KHzxDcc5yJebztuZM8G81SiTRHikbO9U0/CF+JK
    bFPcRTf4vhOLJomhCizhRz5xyHNkKLj8w+OQY7efDRbXxS6zO4xxTbfaKchD3aMjdaWNkrk+/XNP
    CvE2MR/PFp6BCgEAAA==
    PAYLOAD
    
    Finally, convert the payload.sh into a PDF with unoconv:
    Code:
    unoconv -o message.pdf payload.sh
    View the message.pdf you created to make sure the contents resembles the payload.sh file.

    When your recipient opens message.pdf they'll be able to read it and copy/paste the contents into a shell script. Then they can execute that script to print the demo Python code (or redirect it to a file). With a little luck everything will unravel nicely. Obviously you can put other types of files in the payload too.

    You may be thinking there isn't a need to wrap the payload with our shell code. After all, a PDF can be created from the payload directly. But most recipients won't know what to do with base64 encoded text alone; it's confusing gibberish without additional context.

    Another alternative is to generate a PDF file from a shell archive. However, our tiny custom bash script has the benefit of providing some human readable content directly in-line (via heredoc or you could use a comment instead, but heredocs are a bit nicer IMHO). In this way the user can open the PDF and have some introduction to what's going on. This is a trade-off worth considering.

    Finally... I know it's a somewhat complex process in order to merely transfer a file. Unfortunately, sometimes, working around questionable security restrictions requires this kind of rigmarole. Please don't shoot the messenger!

    GL/HF with this fresh approach to a rather old endeavor... sending "a message in a bottle".
     
    Last edited: Dec 6, 2022
    apdxyk, BlueWaterSailor and MarkBrown like this.
  2. MarkBrown

    MarkBrown

    slick as all get out...
     
    spy likes this.
  3. spy

    spy

    Thanks!
     
    MarkBrown likes this.
  4. I agree with @MarkBrown - that's some slick steganography. :) It's also an awesome demonstration of why there is so much duplication of tools in the UNIX world (TMTOWTDI) - once you have this incredible wealth of tools, and can build anything the way you like it, the urge to do so is just irresistible!

    I don't know if this would have saved you any effort, but this - or at least the "putting the message in the bottle" part - is exactly what 'shar' was designed for (part of the 'sharutils' package in Linux.) Not Python specifically, but preservation of whitespace and non-printable characters so they wouldn't get screwed up by being sent through email (as I recall it.) 'shar' creates a shell-executable archive that outputs the file in its original form when executed. As you can probably guess, this thing goes way back, to the days before easy access to good compression utilities.

    As a minor note, 'unoconv' is great but requires installing all the LibreOffice libs, something like 300MB+ worth of them. I do most of my Linux work via ssh to a terminal these days - very little X stuff on that machine - but I've been using text2pdf for over a decade, and it doesn't need anything but the native C libs. If you want it, grab the source and the Makefile, run 'make' and 'make install', and - Shazaam! - you've got a text to PDF converter, about 16k stripped.

    http://www.eprg.org/pdfcorner/text2pdf/
     
    MarkBrown and spy like this.
  5. spy

    spy

    An API is more akin to a specification, the software that implements it is something different. This is why, for example, you can use one version of the TWS API with a different version of the gateway/client software. Also why you download them separately, the API from here but the client software from here.

    The github.io project you're referring to seems to be for a new "beta" version of the API, not the actual client software or IB "gateway". So, it'd probably be inappropriate to report bugs with production versions of their client software to an API project, let alone a beta project.

    Generally, if you think there is a bug in the client/gateway software, you should get in touch with their customer support so they can help you resolve it. More often than not the problem is user error. IB's support will help you determine that. They can also help determine if you've actually found a bug in their stuff or your stuff. And, if it's in their software, they'll usually thank you and prioritize it.

    Regardless, you can still use the technique presented to bypass upload restrictions in many other similar situations too. It's mostly useful "in a pinch". If I were you I'd just tuck it into the memory banks in case you need it later ;)
     
  6. Good attempt to save the argument, but your initial post refers to their code in Python, which is the API. Their gateway is written in Java, in case you want to know.

    Just use the GitHub repo to report errors, what you've done is a convoluted way to complicate things. They will most likely ignore your pdf.
     
  7. spy

    spy

    Yeah, I mention sharutils as an alternative towards the bottom of the post. They're great but the only thing that bothers me is they're "too good" in a sense. If the PDF contained a shar script then the recipient could very easily get scared stiff when they looked at it. I presume you've seen the amount of code that a shar contains... it's quite complete, covers a billion corner cases and maybe overwhelming. I agree it is the proper(tm) way to do things, but quick and dirty works sometimes too.

    I had text2pdf somewhere... it was the first thing I reached for, but couldn't find it when I needed it! Lol, that's how I stumbled on unoconv. It's more bloated I'm sure. But again:

    Code:
    > text2pdf
    
    Command 'text2pdf' not found, did you mean:
    
      command 'text2odf' from deb libopenoffice-oodoc-perl
      command 'texi2pdf' from deb texinfo
    
    Try: sudo apt install <deb name>
    
    Oh, now I remember.... I couldn't find text2pdf and thought for a moment that I should text2ps and then ps2pdf... but, again I hear what you're saying about "so much duplication of tools in the UNIX world". So, decided to keep looking a bit.

    Anyway, I know you've been around the block a few times @BlueWaterSailor so the suggestion to get off my but and use configure/make/make install is duly noted; thanks ;) I'm actually a bit curious now to see how the PDFs generated would differ (between text2pdf and unoconv).

    P.S. "steganography"... great word usage! Haven't heard that one in a looooong time.
     
    Last edited: Dec 6, 2022
    BlueWaterSailor likes this.
  8. spy

    spy

    Thanks, I wasn't aware of this. I thought maybe it was Clojure.
     
  9. Anyway, ignore the bugs that they have in their Python examples.
    I use the C++ and C# ones and ended writing my own library. They are full of bugs. They provide that code as an example but you have to tame it in order to use it.

    If you want yo use Python join a robust library like:
    https://github.com/erdewit/ib_insync

    They have a very active community at:
    https://groups.io/g/insync

    It will save you a few headaches.
     
    #10     Dec 6, 2022