Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BBC: Blead Breaks File::Tabular #22490

Open
cjg-cguevara opened this issue Aug 8, 2024 · 11 comments
Open

BBC: Blead Breaks File::Tabular #22490

cjg-cguevara opened this issue Aug 8, 2024 · 11 comments
Labels
BBC Blead Breaks CPAN - changes in blead broke a cpan module(s) Closable? We might be able to close this ticket, but we need to check with the reporter

Comments

@cjg-cguevara
Copy link

This is a bug report for perl from "Carlos Guevara" [email protected],
generated with the help of perlbug 1.43 running under perl 5.41.3.


BBC: Blead Breaks File::Tabular

Please see http://fast-matrix.cpantesters.org/?dist=File::Tabular


Flags

  • category=core
  • severity=low

Perl configuration

Site configuration information for perl 5.41.3:

Configured by cpan at Wed Aug  7 22:42:44 EDT 2024.

Summary of my perl5 (revision 5 version 41 subversion 3) configuration:
  Commit id: e2b4c6357925d2269403aff2d563ed245a1213ab
  Platform:
    osname=linux
    osvers=5.14.0-427.28.1.el9_4.x86_64
    archname=x86_64-linux
    uname='linux cjg-rhel9 5.14.0-427.28.1.el9_4.x86_64 #1 smp preempt_dynamic fri jul 19 14:40:47 edt 2024 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Dprefix=/home/cpan/bin/perl -Dscriptdir=/home/cpan/bin/perl/bin -Dusedevel -Duse64bitall'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='cc'
    ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O2'
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='11.4.1 20231218 (Red Hat 11.4.1-3)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib /usr/lib64 /usr/local/lib64
    libs=-lpthread -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
    libc=/lib/../lib64/libc.so.6
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.34'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'


---
@INC for perl 5.41.3:
    /home/cpan/bin/perl/lib/site_perl/5.41.3/x86_64-linux
    /home/cpan/bin/perl/lib/site_perl/5.41.3
    /home/cpan/bin/perl/lib/5.41.3/x86_64-linux
    /home/cpan/bin/perl/lib/5.41.3

---
Environment for perl 5.41.3:
    HOME=/home/cpan
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_ALL=C
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/cpan/bin/perl/bin:/home/cpan/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash
@tonycoz
Copy link
Contributor

tonycoz commented Aug 8, 2024

Probably 471c834

@jkeenan
Copy link
Contributor

jkeenan commented Aug 8, 2024

Probably 471c834

Confirmed through bisection.

$ gitshowf 471c834070f1d22c4f2dc1c9997a5adfd959f3e5
commit 471c834070f1d22c4f2dc1c9997a5adfd959f3e5
Author:     Lukas Mai <[email protected]>
AuthorDate: Sun Aug 4 19:40:32 2024 +0200
Commit:     Lukas Mai <[email protected]>
CommitDate: Tue Aug 6 00:59:48 2024 +0200

    open: only treat literal undef as special
...

@mauke, can you take a look? Thanks.

@jkeenan jkeenan added BBC Blead Breaks CPAN - changes in blead broke a cpan module(s) and removed Needs Triage labels Aug 8, 2024
@mauke
Copy link
Contributor

mauke commented Aug 8, 2024

I've opened a bug ticket with two alternative suggestions for fixing the code: https://rt.cpan.org/Ticket/Display.html?id=154767

IMHO the test code in File::Tabular relies on a bug in perl (#22385). It calls open($_[0], $_[1], $_[2]) and expects open to create a temporary file, even though the filename argument is manifestly a variable ($_[2]) and not a literal undef. But because $_[2] (in sub _open) happens to be an alias for $_[1] (in sub new), which in turn is an alias for undef in the constructor call in the test file, this used to trick perl into creating a temp file anyway.

@haarg
Copy link
Contributor

haarg commented Aug 9, 2024

This problem will impact users of autodie as well.

I'm not sure I really like the change to attempt to fix #22385. Requiring a literal undef for this kind of behavior doesn't feel like how things in perl should work. Having open use a temporary file when given any undef seems more sane to me. A warning could be issued if a mode other than +> was used.

@jkeenan
Copy link
Contributor

jkeenan commented Aug 9, 2024

This problem will impact users of autodie as well.

Could you provide an example of the problem autodie users will encounter?

@mauke
Copy link
Contributor

mauke commented Aug 10, 2024

I'm not sure I really like the change to attempt to fix #22385. Requiring a literal undef for this kind of behavior doesn't feel like how things in perl should work. Having open use a temporary file when given any undef seems more sane to me.

It would be more consistent, but it would also fail in annoying (and silent) ways. For example, let's say you extract the name of a file to be processed from your configuration and try to open it:

my $filename = $config->{input_file};
open my $fh, "<", $filename
    or die "Can't open $filename: $!";

If open were to treat any undef as a temp file indicator and $config->{input_file} is not set (for any reason), this would silently open an empty file (instead of failing loudly). For my taste, "any undef means use a temp file instead" is just a bit too much magic given how easy it is to inadvertently end up with undef in a variable or expression.

The current behavior of released perls (before my changes in #22385) catches the case above reliably, but is otherwise hard to characterize. The code simply checks if (sv == &PL_sv_undef), which is implementation details of the interpreter leaking through. For example, if we take the example above and remove the $filename variable, …

open my $fh, "<", $config->{input_file}
    or die "Can't open $config->{input_file}: $!";

… the behavior now depends on whether the input_file key exists in %$config or not. If it doesn't exist, perl will open a temporary file. If it exists but is undef, perl will fail (as before). The behavior of any expression in the third argument of open depends on whether it returns any undef value in general or the canonical undef value in particular. (I don't think this difference is even documented anywhere; we only have "returns undef on error" or similar.)

So after rejecting the status quo (inconsistent, hard to describe without resorting to implementation details) and a simple defined() check at runtime (consistent, but even more error prone and potentially incompatible with some existing code), I decided to turn back all the way to static compile-time checks. There is precedent for this sort of thing in Perl:

  • Built-in functions in general get to have whatever syntax they want. Some of it can be mimicked with prototypes, other parts cannot.
  • system and exec switch to different behavior if given a bare block as the first ("indirect object") argument.
  • <...> decides whether to be glob() or readline() based purely on syntax and in a way no other operator does. For example, <foo> and <$foo> mean readline(foo) and readline($foo), respectively, but <${foo}> and <$ foo> both mean glob($foo).
  • eof (no argument) and eof() (empty parens) do different things.
  • if (1 .. 10) does something different from my ($x, $y) = (1, 10); if ($x .. $y).

And so now open is syntactic, too: "If the third argument is a literal undef, you get a temporary file" is a simple rule that matches what perldoc -f open describes. The check is tighter than before, so there are fewer chances (hopefully none) for you to get a temporary file when you didn't expect it (which is what #22385 is about). It is technically a change in the behavior of open, but only in cases that (depending on how you read perldoc -f open) had undefined behavior or had behavior that contradicted the documentation.

@haarg
Copy link
Contributor

haarg commented Aug 10, 2024

Opening a temp file with a < mode makes no sense, so for the example you gave, a warning could easily be added.

Various core functions do have special parsing, but we generally consider those things warts to avoid in modern designs. And some of them are clearly unique by their syntax alone. system and exec with a block are obviously not standard function calls. <...> is also obviously unique syntax. open looks like a normal function call, so having not behave like one is confusing.

There is also precedence for odd compile time formulations being removed. Prior to perl 5.18, splitting on whitespace using split ' ' required the argument to be a literal string. perl5180delta

@haarg
Copy link
Contributor

haarg commented Aug 10, 2024

This problem will impact users of autodie as well.

Could you provide an example of the problem autodie users will encounter?

Until this change, you could write:

use autodie;
open my $fh, '+>', undef;

This would open a file handle to a temp file. Since autodie overrides the open function, the magic added in #22465 won't apply. autodie is implemented in perl using normal functions, and its wrapper calls open($_[0], $_[1], @_[2 .. $#_]). The only way it could reimplement the new behavior of perl's open is by plugging in to the parser.

@damil
Copy link

damil commented Aug 10, 2024

Hi there,
I'm joining this discussion because I'm the author of File::Tabular and became aware of the issue through the ticket opened by MAUKE.
At first I found very surprising that a raw undef is not the same thing as an undefined variable, and thought it would be better to generalize the behaviour. But after reading the whole discussion in #22385 I'm now convinced by MAUKE's argument that a call to open() with an undefined variable is not the same intention than a raw undef and therefore should complain and fail.

Regarding File::Tabular, it's really not an issue if the behavior changes in 5.40 : the tempfile feature of open() was just convenient for tests, it's not essential to the module. However, if I understand correctly, replacing line 5 in

sub _open { # CORE::open is prototyped, so it cannot take args from an array. This is a workaround
  my $self   = shift;
  my $result = (ref $_[1] eq 'GLOB') ? $_[0] = $_[1]                         :
               @_ > 3                ? open($_[0], $_[1], $_[2], @_[3..$#_]) :
               @_ > 2                ? open($_[0], $_[1], $_[2])             :
                                       open($_[0], $_[1]);

by

               @_ > 2                ? open($_[0], $_[1], $_[2] // undef)             :

would solve the issue, correct ? Maybe a similar approach could be done for autodie and Fatal.pm.

Besides, this magic opening of tempfiles through an undef arg was probably not a very good idea of perl 5.8 : in client code it would be more readable to make explicit calls to File::Temp. So maybe a deprecation policy for this feature could be envisioned, instead of trying to preserve all idiosyncrasies.

@haarg
Copy link
Contributor

haarg commented Aug 11, 2024

               @_ > 2                ? open($_[0], $_[1], $_[2] // undef)             :

No, this will not work. The third parameter is an expression, not a literal undef, so it won't trigger the temp file behavior. And even if it did, it would change how the parameters are handled vs the core open, which isn't how autodie is meant to work.

Besides, this magic opening of tempfiles through an undef arg was probably not a very good idea of perl 5.8 : in client code it would be more readable to make explicit calls to File::Temp. So maybe a deprecation policy for this feature could be envisioned, instead of trying to preserve all idiosyncrasies.

I agree that this isn't a great interface for doing this.

@jkeenan
Copy link
Contributor

jkeenan commented Sep 17, 2024

File-Tabular is now passing on CPANtesters, @damil having released a new CPAN version on August 11. That would make this ticket closable, but there was a lot of discussion about larger issues between @mauke and @haarg. How should we proceed?

@jkeenan jkeenan added the Closable? We might be able to close this ticket, but we need to check with the reporter label Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BBC Blead Breaks CPAN - changes in blead broke a cpan module(s) Closable? We might be able to close this ticket, but we need to check with the reporter
Projects
None yet
Development

No branches or pull requests

6 participants