Idea in extracting the title of an article?

Discussion in 'App Development' started by blueraincap, Mar 14, 2024.

  1. 2rosy

    2rosy

    be grateful. without threads like this there would be no ET
     
    #21     Mar 14, 2024
    Baron likes this.
  2. ph1l

    ph1l

    Here is a crude method that works sometimes:
    Code:
    #!/bin/bash
    
    if [[ $# -ne 1 ]]
    then
        echo 1>&2 "Usage: $0 file.pdf
        finds the pdf file's title, and outputs it to the standard output"
        exit 1
    fi
    
    pdftotext -layout "${1}" - |
    perl -n -e 'use warnings; use strict;
    our $processingTitle; our @title;
    my $l = $_;
    $l =~ s/^\f//; # remove feeds
    if ( $processingTitle )
    {
        if ( $l =~ /^\s+$/ )
        {
            last;
        }
    
        my ( $parts ) = $l =~ /^\s*(\S.+)/;
        $parts =~ s/\s+$//;
        push (@title, $parts);
    }
    elsif ( $l =~ /^\s*(\S.+)/ )
    {
        $processingTitle = 1;
        my $parts = $1;
        $parts =~ s/\s+$//;
        push (@title, $parts);
    }
    END {
    if ( scalar(@title) == 0 )
    {
        print "NO TITLE FOUND\n";
    }
    else
    {
        print join(" ", @title), "\n";
    }
    }'
    
    
    For example, assuming the environment has bash, pdftotext, and perl:
    https://assets.super.so/e46b77e7-ee...iles/2f2fc428-925c-4041-9f6d-bc387d904820.pdf
    Code:
    $ findpdftitle 2f2fc428-925c-4041-9f6d-bc387d904820.pdf
    Commodity Option Implied Volatilities and the Expected Futures Returns
    
     
    #22     Mar 14, 2024
  3. Quanto

    Quanto

    For the archives :) :

    blueraincap.png
     
    #23     Mar 14, 2024
  4. hilmy83

    hilmy83

    You either:

    1. Lost money
    2. Got cheated on
    3. Off your meds
    4. All the above..


    upload_2024-3-14_11-26-33.png
     
    #24     Mar 14, 2024
    beginner66, semperfrosty and Quanto like this.
  5. Quanto

    Quanto

    And what if he has no clue of Linux/Unix, bash ? :)
     
    #25     Mar 14, 2024
  6. Quanto

    Quanto

    I think he was "high", ie. doped, was under the influence of drugs, and alcohol...
     
    #26     Mar 14, 2024
    hilmy83 likes this.
  7. hilmy83

    hilmy83

    Until some new nick with 3 posts and 0 likes shows up and open up a new thread asking "so, I have a friend that needs coding help" lol
     
    #27     Mar 14, 2024
    semperfrosty likes this.
  8. Quanto

    Quanto

    When saving such PDF downloads, I always also save the web page where I found it.
    Web pages of course have a filename as well, and this one mostly is a real title, not something cyptic like 194567.pdf :).
    Now, later when I watch the directory listing sorted by datetime, I can see what the accompanying html doc was, and also open it in the browser, and so find out what the PDF is about, incl. title etc. and wherefrom I did download it...
     
    #28     Mar 14, 2024
  9. ajacobson

    ajacobson

    Baron should ban the poster and at the bare minimum, we all should block him.
     
    #29     Mar 14, 2024
  10. Zwaen

    Zwaen

    It’s weird, he seemed like a normal guy, before. The world is changing
     
    #30     Mar 14, 2024