LINE INPUT (file) enhancement discussion

Started by MrBcx, August 31, 2020, 07:02:23 PM

Previous topic - Next topic

MrBcx

Attached is a working prototype for the file form of the LINE INPUT statement.
This will allow static and dynamic variables to be used and will help guard against
variable memory overrun.  If incorporated into BCX, the changes to LINE INPUT will be
transparent and 100% backwardly compatible.

Test.bas can be translated with BCX 753 and compiled with everything.

Test.bas reveals an interesting behavior when the line being read is greater than the dynamic string allocation.

I don't know if the problem that creates is my problem to solve or not.  I'm leaning on the idea that every
application programmer has a responsibility to have an adequately sized variable, otherwise undefined
behavior. occurs.

All the compilers I tested yield the same output EXCEPT Pelles C.  And Pelles output also differs with itself
depending on whether you compile as 32bit or 64bit.  Go figure ...

My bottom line:

My prototype could be integrated into BCX very quickly, if we agree that individual application programmers
have a responsibility to adequately size their variables and perform fundamental testing on their applications.

Opinions?

Robert

Quote from: MrBcx on August 31, 2020, 07:02:23 PM
Attached is a working prototype for the file form of the LINE INPUT statement.
This will allow static and dynamic variables to be used and will help guard against
variable memory overrun.  If incorporated into BCX, the changes to LINE INPUT will be
transparent and 100% backwardly compatible.

Test.bas can be translated with BCX 753 and compiled with everything.

Test.bas reveals an interesting behavior when the line being read is greater than the dynamic string allocation.

I don't know if the problem that creates is my problem to solve or not.  I'm leaning on the idea that every
application programmer has a responsibility to have an adequately sized variable, otherwise undefined
behavior. occurs.

All the compilers I tested yield the same output EXCEPT Pelles C.  And Pelles output also differs with itself
depending on whether you compile as 32bit or 64bit.  Go figure ...

My bottom line:

My prototype could be integrated into BCX very quickly, if we agree that individual application programmers
have a responsibility to adequately size their variables and perform fundamental testing on their applications.

Opinions?

Hi MrBCX:

Compiling with MSVC 64 bit.

In test.bas, changing



DIM c$ * 51  ' Enough for 50 chars + terminating null



to



DIM c$ * 100  ' Enough for 50 chars + terminating null



causes no output. Nothing.

MrBcx

Hi Robert -- you changed the test. 
If you add a 4th line to Test.txt you will see all is well.

Remember ... LINE INPUT is usually found in a loop like:

WHILE NOT EOF(fp1) '  <<-- With no 4th line, the program ended prematurely.
LINE INPUT fp1,BlahBlah$
WEND

Test.bas was not written for that purpose ... it was to show partial read and truncation.


With c$ dim'd to 100 and a 4th line added to Test.txt, these are my results:

c:\Temp\LineInputFile>bc test -c

BCX BASIC to C/C++ Translator (c) 1999-2020 by Kevin Diggins
Version 7.5.4 (2020/08/31)  Compiled for 64-bit Windows using LLVM-CLANG
[Lines In: 34] [Lines Out: 237] [Statements: 17] [Time: 0.02 sec's]
BCX translated [Test.Bas] to [Test.Cpp] for a C++ Compiler


c:\Temp\LineInputFile>vc64 test
Compiling test With MSVC for 64-bit
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

test.cpp
Linking test
Microsoft (R) Incremental Linker Version 14.00.24215.1
Copyright (C) Microsoft Corporation.  All rights reserved.

Completed

c:\Temp\LineInputFile>test
1-Now is the time for all BCX developers to come to the aid of their open source project.
2-Now is the time for all BCX developers to come to the aid of their open source project.
3-Now is the time for all BCX developers to come to the aid of their open source project.
4-Now is the time for all BCX developers to come to the aid of their open source project.

Press any key to continue . . .


Robert

Quote from: MrBcx on August 31, 2020, 09:49:58 PM
Hi Robert -- you changed the test. 
If you add a 4th line to Test.txt you will see all is well.

Remember ... LINE INPUT is usually found in a loop like:

WHILE NOT EOF(fp1) '  <<-- With no 4th line, the program ended prematurely.
LINE INPUT fp1,BlahBlah$
WEND

Test.bas was not written for that purpose ... it was to show partial read and truncation.


With c$ dim'd to 100 and a 4th line added to Test.txt, these are my results:

c:\Temp\LineInputFile>bc test -c

BCX BASIC to C/C++ Translator (c) 1999-2020 by Kevin Diggins
Version 7.5.4 (2020/08/31)  Compiled for 64-bit Windows using LLVM-CLANG
[Lines In: 34] [Lines Out: 237] [Statements: 17] [Time: 0.02 sec's]
BCX translated [Test.Bas] to [Test.Cpp] for a C++ Compiler


c:\Temp\LineInputFile>vc64 test
Compiling test With MSVC for 64-bit
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

test.cpp
Linking test
Microsoft (R) Incremental Linker Version 14.00.24215.1
Copyright (C) Microsoft Corporation.  All rights reserved.

Completed

c:\Temp\LineInputFile>test
1-Now is the time for all BCX developers to come to the aid of their open source project.
2-Now is the time for all BCX developers to come to the aid of their open source project.
3-Now is the time for all BCX developers to come to the aid of their open source project.
4-Now is the time for all BCX developers to come to the aid of their open source project.

Press any key to continue . . .

Hi MrBCX:

I don't want you to think I'm kicking sand in your face but, with 100, I do get more than nothing by replacing mLineInputFile with BCX's good old but slightly broken LINE INPUT.

Test.txt

1-Now is the time for all BCX developers to come to the aid of their open source project.
2-Now is the time for all BCX developers to come to the aid of their open source project.
3-Now is the time for all BCX developers to come to the aid of their open source project.


test2.bas

DIM a$
DIM b$
DIM c$ * 100  ' Enough for some chars + terminating null
DIM d$

OPEN "test.txt" FOR INPUT AS fp1

LINE INPUT fp1, a$
LINE INPUT fp1, b$
LINE INPUT fp1, c$
LINE INPUT fp1, d$

? a$
? b$
? c$
? d$

CLOSE

PAUSE



Result



C:\Users\irwis\Downloads>test2
1-Now is the time for all BCX developers to come to the aid of their open source project.
2-Now is the time for all BCX developers to come to the aid of their open source project.
3-Now is the time for all BCX developers to come to the aid of their open source project.


Press any key to continue . . .



I know you don't like nothing, so don't change the 100 to 51 in the code above because that's what you'll get, nothing. A totally absolutely completely truncated nothing.

MrBcx

Robert,

The old way is using a hard-coded 2048 byte read, does not make use of _msize or sizeof, and can overflow static and dynamic buffers under the right circumstances.

I'm cool with keeping things the way they are, if that's what you're suggesting.  But I thought you, JC, and perhaps others hoped for a more robust implementation which is what I was aiming for.

I don't know why, using your new example,  the old way doesn't fail on EOF but the new way does.  If you ask me, failing on EOF, as what the new way appears to be doing, is exactly what it should be doing.  Your example has 4 LINE INPUT statements but Test.txt only came with 3 lines of text in it.

MrBcx

Thinking out loud ...

On the subject of partial read and truncation using mLineInputFile, I think an improvement would be to buffer the fgets into the system InputBuffer$ then copy those contents to the users' variable, governed by each variables'  _msize / sizeof. 

That way, in all but the rarest of instances, no viable DISK READ would ever be truncated (a partial disk read) but its data that gets copied to the user's variable might be, if the user failed to size it correctly.

Robert

#6
Quote from: MrBcx on August 31, 2020, 11:17:30 PM
Robert,

The old way is using a hard-coded 2048 byte read, does not make use of _msize or sizeof, and can overflow static and dynamic buffers under the right circumstances.

I'm cool with keeping things the way they are, if that's what you're suggesting.  But I thought you, JC, and perhaps others hoped for a more robust implementation which is what I was aiming for.

I don't know why, using your new example,  the old way doesn't fail on EOF but the new way does.  If you ask me, failing on EOF, as what the new way appears to be doing, is exactly what it should be doing.  Your example has 4 LINE INPUT statements but Test.txt only came with 3 lines of text in it.

Hi MrBCX;

I'm not suggesting anything more, less or different  than what you are trying to achieve. The change that was made to prevent buffer overflow was to have the fetch size no larger than the dimensioned size of the receiving buffer. The simplest solution was implemented. TheStaticString$ is 2048 so the fetch size will be and is now 2048. That can remain as the default.

I would like to add to this default a user definable variable or an automatically calculated one, replacing BcxStrSize in BCX 7.5.3 in line 15258. If it can not be calculated automatically then the BcxStrSize size could stay as the default and perhaps a new directive $LNPUTSIZE could be implemented, which, if defined, would replace the default with the user specified size. 

It would be nice to have this done automatically, when the user dynamically dimensions the receiving buffer to 100000 the fetch size is set to the same. This is what you are now trying to do and which I have attempted to do several times over the last few months since James suggested such a change. I couldn't make it work. I am hopeful that you can but if not there is the optional manual alteration of the fetch size as a possibilty.


jcfuller

Would there be any benefit using the WinApi instead of or alternative to the stdio file io.?

I used it in bc9 for both Fred's string class and char.

James


int LineInput(HANDLE, char*, int = 2048);
int fPrintS (HANDLE, char*);


OpenFile sFile$ For OUTPUT As hFile

fPrintS(hFile,sTmp$)

CloseHandle(hFile)

OpenFile sFile$ For APPEND As hFile

retval = LineInput(hFile,sTmp$)

This is the LineInput (it is c++ using "new") but could be adapted to dynamic $


int LineInput (HANDLE hFile, char*  sData, int ReadBufferSize)
{
    char*   ReadBuffer = {0};
    DWORD    dwBytesRead = {0};
    DWORD    dwPtr = {0};
    long     where = {0};
    int      j = {0};
    ReadBuffer = new char[ ReadBufferSize];
    dwPtr = SetFilePointer( hFile, 0, NULL, FILE_CURRENT);
    if(FALSE == ReadFile(hFile, ReadBuffer, ReadBufferSize, &dwBytesRead, NULL))
    {
        delete []ReadBuffer;
        return -3;
    }
    if(dwBytesRead == 0 )
    {
        delete []ReadBuffer;
        return -1;
    }
    {
        int      i;
        for(i = 0; i < dwBytesRead; i++)
        {
            if(ReadBuffer[i] == 10 )
            {
                where = i;
                j = where;
                if(ReadBuffer[i - 1] == 13 )
                {
                    where = i + 1;
                    j = i - 1;
                }
                break;
            }
        }
    }
    SetFilePointer(hFile, dwPtr + where, NULL, FILE_BEGIN);
    ReadBuffer[j]  = 0;
    strcpy(sData, ReadBuffer);
    delete []ReadBuffer;
    return 0;
}


MrBcx

#8
JC,

The first issue that I have with your suggestion is that the WinApi uses HANDLE and the C runtime uses FILE*.

Two way communication between them is an ugly, cumbersome PITA. 
For example, below is my recently upgraded SEEK runtime.  Besides using HANDLE instead of FILE*, Windows
also uses differently named constants and values -- eg:   FILE_BEGIN versus SEEK_SET


IF Use_Seek THEN
    FPRINT FP_WRITE,"void Seek (FILE* hFile_, LONGLONG FilePtr_)"
    FPRINT FP_WRITE,"{"
    FPRINT FP_WRITE,"  LARGE_INTEGER  NewPosition;"
    FPRINT FP_WRITE,"  NewPosition.QuadPart= FilePtr_ * sizeof(char);"
    FPRINT FP_WRITE,"  SetFilePointerEx((HANDLE)_get_osfhandle(_fileno(hFile_)),NewPosition,NULL,FILE_BEGIN);"
    FPRINT FP_WRITE,"}\n\n"
  END IF

In a perfect world, everything emitted by BCX would use one or the other but not both.


MrBcx

This is the roadmap that I'm considering for revising the LINE INPUT statement (the file version)

Preface:

* LINE INPUT has always been intended exclusively for text files.
* BCX will impose a maximum single line length of 1 MB.
* Attempting to read more than 1 MB will result in undefined behavior.
* NOTHING GOOD comes from partially reading a line of text from disk.

General Details:

1) InputBuffer$ (an internal variable) will be hardwired to 1MB + 1 byte

2) Disk reads (using fgets) shall be read into InputBuffer$

3) InputBuffer$ contents will be copied to the USER variable based on its ( _msize or sizeof ) allocation

4) If data in InputBuffer$ exceeds the user's variable allocation, ABORT with a Windows MessageBox
    alerting the user that his variable contains truncated data and suggest its allocation be increased or
    the data file be inspected for possible corruption. 





Robert

Quote from: jcfuller on September 01, 2020, 12:17:30 PM
Would there be any benefit using the WinApi instead of or alternative to the stdio file io.?

I used it in bc9 for both Fred's string class and char.

James


int LineInput(HANDLE, char*, int = 2048);
int fPrintS (HANDLE, char*);


OpenFile sFile$ For OUTPUT As hFile

fPrintS(hFile,sTmp$)

CloseHandle(hFile)

OpenFile sFile$ For APPEND As hFile

retval = LineInput(hFile,sTmp$)

This is the LineInput (it is c++ using "new") but could be adapted to dynamic $


int LineInput (HANDLE hFile, char*  sData, int ReadBufferSize)
{
    char*   ReadBuffer = {0};
    DWORD    dwBytesRead = {0};
    DWORD    dwPtr = {0};
    long     where = {0};
    int      j = {0};
    ReadBuffer = new char[ ReadBufferSize];
    dwPtr = SetFilePointer( hFile, 0, NULL, FILE_CURRENT);
    if(FALSE == ReadFile(hFile, ReadBuffer, ReadBufferSize, &dwBytesRead, NULL))
    {
        delete []ReadBuffer;
        return -3;
    }
    if(dwBytesRead == 0 )
    {
        delete []ReadBuffer;
        return -1;
    }
    {
        int      i;
        for(i = 0; i < dwBytesRead; i++)
        {
            if(ReadBuffer[i] == 10 )
            {
                where = i;
                j = where;
                if(ReadBuffer[i - 1] == 13 )
                {
                    where = i + 1;
                    j = i - 1;
                }
                break;
            }
        }
    }
    SetFilePointer(hFile, dwPtr + where, NULL, FILE_BEGIN);
    ReadBuffer[j]  = 0;
    strcpy(sData, ReadBuffer);
    delete []ReadBuffer;
    return 0;
}



Hi James:

Can it read a line longer than 2048 bytes? Is there a limit to how long a line it can read?


Robert

Quote from: MrBcx on September 01, 2020, 02:10:12 PM
This is the roadmap that I'm considering for revising the LINE INPUT statement (the file version)

Preface:

* LINE INPUT has always been intended exclusively for text files.
* BCX will impose a maximum single line length of 1 MB.
* Attempting to read more than 1 MB will result in undefined behavior.
* NOTHING GOOD comes from partially reading a line of text from disk.

General Details:

1) InputBuffer$ (an internal variable) will be hardwired to 1MB + 1 byte

2) Disk reads (using fgets) shall be read into InputBuffer$

3) InputBuffer$ contents will be copied to the USER variable based on its ( _msize or sizeof ) allocation

4) If data in InputBuffer$ exceeds the user's variable allocation, ABORT with a Windows MessageBox
    alerting the user that his variable contains truncated data and suggest its allocation be increased or
    the data file be inspected for possible corruption.

Hi MrBCX:

Check this out from the Carnegie-Mellon Software Engineering Institute and see if it can offer any useful ideas.

FIO20-C. Avoid unintentional truncation when using fgets() or fgetws()

https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152445

MrBcx

Robert - thanks for sharing the link.

1   The examples involving corrupt data files are not my concern.  GIGO.
2.  The example involving realloc to increase buffer size has already been discussed and tested.
3.  Likewise, the POSIX getline function.

Considering most everything discussed on this forum has a perceptible audience of 2.1 persons,
I fear that my enthusiasm for putting anymore effort into the public version of BCX is waning.



Robert

Quote from: MrBcx on September 01, 2020, 03:16:44 PM
Robert - thanks for sharing the link.

1   The examples involving corrupt data files are not my concern.  GIGO.
2.  The example involving realloc to increase buffer size has already been discussed and tested.
3.  Likewise, the POSIX getline function.

Considering most everything discussed on this forum has a perceptible audience of 2.1 persons,
I fear that my enthusiasm for putting anymore effort into the public version of BCX is waning.

I hear you, brother.

Maybe later ...

Take care.

jcfuller

Robert,
  You need to know your file data (has CRLF) or it will go off into lala I suppose. Alsobe aware of your buffer size.

The proto is:
  int LineInput(HANDLE, char*, int = 2048);

I wrote it before Fred added stdio to his TCLib.

James