Author Topic: DeRangeD  (Read 262 times)

Robert

  • Full Member
  • ***
  • Posts: 212
    • View Profile
DeRangeD
« on: February 24, 2020, 04:00:04 AM »
I wrote and compiled DeRangeD.bas which creates a text file of user defined number of lines and random line length within a user defined minimum and maximum length range of each line.

For example, the command line invocation:

DeRangeD 200 3000 1000000

would create a 1000000 line text file with each random length line between 200 and 3000 characters in length comprised of randomly chosen chars between ASCII 32 (space) and 126 (tilde).

On a Ryzen7 1800X, running

DeRangeD 200 3000 1000000

took less than a minute to create the 1 million line file the size of which was 1,603,973,069 bytes.

I used this file to compare MrBCX's Fetchline with the getline added in Pelle's C. The getline addition is part of ISO TR24731-2 - "Extensions to the C library, Dynamic allocation functions".

In the timing test, both getline and Fetchline were stripped down implementations, simply retrieving a line at a time from the 1 million line DeRangeD.txt file until the end was reached. Running on a Ryzen7 1800X

getline took 13.5625 seconds

while

Fetchline took 55.0625 seconds

BCX LINE INPUT, with DIM Buffer$ * 1048832,  to accomodate lines longer than 2048 bytes, took 13.40625 seconds

BCX LINE INPUT, with DIM Buffer$[1048832] AS CHAR,  to accomodate lines longer than 2048 bytes, took 14.375 seconds


I have attached a copy of DeRangeD.bas to this post.

« Last Edit: February 25, 2020, 10:41:03 AM by Robert »

jcfuller

  • Jr. Member
  • **
  • Posts: 75
    • View Profile
Re: DeRangeD
« Reply #1 on: February 24, 2020, 04:09:05 AM »
Robert,
  What do you use for timing. I remember I use to have a simple app that was quite useful but I don't know where or what it was :)

James

Robert

  • Full Member
  • ***
  • Posts: 212
    • View Profile
Re: DeRangeD
« Reply #2 on: February 24, 2020, 04:27:35 AM »
Robert,
  What do you use for timing. I remember I use to have a simple app that was quite useful but I don't know where or what it was :)

James

I just used BCX TIMER. Check out the BCX Help file.

jcfuller

  • Jr. Member
  • **
  • Posts: 75
    • View Profile
Re: DeRangeD
« Reply #3 on: February 24, 2020, 04:50:47 AM »
Robert,
  I preferred an external app for timing as it weeded out the boasting of interpreters on how fast they were when it took 10-15 seconds for them  initialize :)

James

MrBcx

  • Administrator
  • Sr. Member
  • *****
  • Posts: 264
    • View Profile
Re: DeRangeD
« Reply #4 on: February 24, 2020, 08:48:19 AM »
That was pretty interesting Robert.

Not that I expect it will change things very much (if at all) but I thought I'd ask if
you used the updated version of Fetchline that I uploaded 2 days ago (MST)


FUNCTION Fetchline ( FP as FILE, BYREF MyBuf$ ) AS UINT
  DIM LineLen AS UINT
  DIM Static PrevLen
    LineLen = GetLineLen (FP)
    If LineLen > PrevLen THEN   ' <<- Only REDIM when needed
      REDIM MyBuf$ * LineLen
    END IF
    LINE INPUT FP, MyBuf$
    PrevLen = LineLen
  FUNCTION = LineLen
END FUNCTION

jcfuller

  • Jr. Member
  • **
  • Posts: 75
    • View Profile
Re: DeRangeD
« Reply #5 on: February 24, 2020, 09:24:27 AM »
Robert,
  AS you have this huge file already created I'd like to see the numbers using c++ stl iostream, std::string and getline

James

Robert

  • Full Member
  • ***
  • Posts: 212
    • View Profile
Re: DeRangeD
« Reply #6 on: February 24, 2020, 12:50:02 PM »
Robert,
  AS you have this huge file already created I'd like to see the numbers using c++ stl iostream, std::string and getline

James

Hi Jjames:

Give me the examples and I will run them.

Or I could send you a copy of the file. But even using high compress 7z it squishes down to 1.3 gigabytes from 1.6, which shows that the data in the file is very random.

Robert

  • Full Member
  • ***
  • Posts: 212
    • View Profile
Re: DeRangeD
« Reply #7 on: February 24, 2020, 01:04:50 PM »
That was pretty interesting Robert.

Not that I expect it will change things very much (if at all) but I thought I'd ask if
you used the updated version of Fetchline that I uploaded 2 days ago (MST)


FUNCTION Fetchline ( FP as FILE, BYREF MyBuf$ ) AS UINT
  DIM LineLen AS UINT
  DIM Static PrevLen
    LineLen = GetLineLen (FP)
    If LineLen > PrevLen THEN   ' <<- Only REDIM when needed
      REDIM MyBuf$ * LineLen
    END IF
    LINE INPUT FP, MyBuf$
    PrevLen = LineLen
  FUNCTION = LineLen
END FUNCTION

Yes, MrBCX, I did use the updated version. I just ran the earlier version and the results were only a bit slower,  59.6875 seconds compared to output from the updated version of 55.0625 seconds.

jcfuller

  • Jr. Member
  • **
  • Posts: 75
    • View Profile
Re: DeRangeD
« Reply #8 on: February 24, 2020, 01:59:53 PM »
Robert,
  I think this will work.
James
Code: [Select]

$CPPHDR
$CPP
$NOMAIN
$ONEXIT "VSCPP.BAT $FILE$ -m64 con"
'------------------------------------------------------------------------------
$HEADER
    typedef std::string stdstr;
$HEADER
'------------------------------------------------------------------------------
$CCODE CONST
    using namespace std;
$CCODE

'==============================================================================
Function LoadTheBuffer(f As const char Ptr) As BOOL
   
    Dim As stdstr ssLine
    Raw As ifstream InFile(f)
   
    If InFile.is_open() Then
        While InFile.good()
            getline(InFile,ssLine)
        Wend
    End If
    InFile.close()
    Function = 0
End Function
'==============================================================================
function main()
    DIM Start!
     DIM Finish!
     DIM Duration!
    Start! = TIMER
    LoadTheBuffer("DeRangeD.txt")
    Finish! = TIMER
    Duration! = Finish! - Start!
    PRINT Duration
   
    pause
End Function

Robert

  • Full Member
  • ***
  • Posts: 212
    • View Profile
Re: DeRangeD
« Reply #9 on: February 24, 2020, 03:35:45 PM »
Robert,
  I think this will work.
James
Code: [Select]

$CPPHDR
$CPP
$NOMAIN
$ONEXIT "VSCPP.BAT $FILE$ -m64 con"
'------------------------------------------------------------------------------
$HEADER
    typedef std::string stdstr;
$HEADER
'------------------------------------------------------------------------------
$CCODE CONST
    using namespace std;
$CCODE

'==============================================================================
Function LoadTheBuffer(f As const char Ptr) As BOOL
   
    Dim As stdstr ssLine
    Raw As ifstream InFile(f)
   
    If InFile.is_open() Then
        While InFile.good()
            getline(InFile,ssLine)
        Wend
    End If
    InFile.close()
    Function = 0
End Function
'==============================================================================
function main()
    DIM Start!
     DIM Finish!
     DIM Duration!
    Start! = TIMER
    LoadTheBuffer("DeRangeD.txt")
    Finish! = TIMER
    Duration! = Finish! - Start!
    PRINT Duration
   
    pause
End Function

Hi James:

I moved the timing  loop tight around the WHILE/WEND and found that with Nuwen64 compile the time was 7.625 seconds to run through DeRangeD.txt. VS 2019 cl.exe compiled was 8.90625 seconds.

Very respectable when compared to LINE INPUT and Pelle's getline at around 14 seconds.

I may run a check to see if they are actually fetching the lines accurately or just going through the motions faking it. I think just getting their reported line lengths would be enough although I guess the fetch could be compared to a copy of the DeRanged.txt during extraction.

So how about the stl iostream and std::string? You up to writing them up so I can check them?

MrBcx

  • Administrator
  • Sr. Member
  • *****
  • Posts: 264
    • View Profile
Re: DeRangeD
« Reply #10 on: February 24, 2020, 04:00:21 PM »

Very respectable when compared to LINE INPUT and Pelle's getline at around 14 seconds.


Wait ... WHAT?  All this time I assumed getline () was only in the C++ domain.

Writing Fetchline was a complete and unsatisfying waste of time.

Robert

  • Full Member
  • ***
  • Posts: 212
    • View Profile
Re: DeRangeD
« Reply #11 on: February 24, 2020, 04:45:38 PM »

Very respectable when compared to LINE INPUT and Pelle's getline at around 14 seconds.


Wait ... WHAT?  All this time I assumed getline () was only in the C++ domain.

Writing Fetchline was a complete and unsatisfying waste of time.

Uh ??? Are we on the same page here?

Maybe a MrBCX doppelganger?

Maybe this will kickstart those grey cells.

https://bcxbasiccoders.com/smf/index.php?topic=140.msg560#msg560

MrBcx

  • Administrator
  • Sr. Member
  • *****
  • Posts: 264
    • View Profile
Re: DeRangeD
« Reply #12 on: February 24, 2020, 05:03:39 PM »
Okay ... I'm entitled to a senior moment every now and then  8)

Robert

  • Full Member
  • ***
  • Posts: 212
    • View Profile
Re: DeRangeD
« Reply #13 on: February 24, 2020, 05:33:56 PM »
Okay ... I'm entitled to a senior moment every now and then  8)

 I know, I know ... I look at the children and they all turn their heads and giggle.

Robert

  • Full Member
  • ***
  • Posts: 212
    • View Profile
Re: DeRangeD
« Reply #14 on: February 24, 2020, 10:14:41 PM »
New version, attached to original post of this thread, now takes less than a minute instead of 12 minutes to build a 1 million line file on my Ryzen 7 1800X.