Trimming spaces and tabs inside a string

Started by Vortex, June 08, 2024, 04:33:52 AM

Previous topic - Next topic

Vortex

Hello,

An example to trim all spaces and tabs inside a string :

FUNCTION RemoveSpaces (MyStr AS STRING, buff AS STRING) AS INTEGER

    LOCAL buff2 AS LPBYTE
    LOCAL t as UCHAR

    SET lookupTbl [] AS UCHAR
        1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
        0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

    END SET

    buff2 = (LPBYTE)buff
   
    DO
        t=*MyStr
        *buff = t
        buff = buff+lookupTbl[t]
        MyStr = MyStr+1
       
    LOOP WHILE t

    FUNCTION = (LPBYTE)buff-buff2-1
   
END FUNCTION


DIM l AS INTEGER
DIM b AS STRING

l = RemoveSpaces("   This   is a test. ", b)
PRINT "Trimmed string = ",b
PRINT "Length of the string = ",l

airr

Interesting code, Vortex, Thanks.

Here is my take on it:


dim l,b as string

l = RemoveSpaces("   This   is a test. ",b)
print "Trimmed string = ",b
print "Length of the string = ", l

pause

FUNCTION RemoveSpaces (MyStr AS STRING, buff AS STRING) AS INTEGER
    replace spc$ with nul$ in MyStr
    memcpy(buff,MyStr,len(MyStr))
    return len(buff)
END FUNCTION


Not sure it's 100% safe, but it seems to work in this instance.

AIR.

MrBcx

Below is another way that uses built-in BCX features to remove all 6 common white space characters.

Vortex's lookup table needs only minor modifications to achieve the same results.
In fact, his lookup table could be modified to filter any of the 256 ASCII characters.
His function also runs much faster, especially important in a data intensive operation.


DIM b AS STRING
b = "   This   is a test. "

RemoveAllWhiteSpace(b)

PRINT "Trimmed string = ", b
PRINT "Length of the string = ", LEN(b)
PAUSE

SUB RemoveAllWhiteSpace(MyStr AS STRING)
    REMOVE TAB$ from MyStr  '  9 ASCII character
    REMOVE LF$  from MyStr  ' 10 ASCII character
    REMOVE VT$  from MyStr  ' 11 ASCII character
    REMOVE FF$  from MyStr  ' 12 ASCII character
    REMOVE CR$  from MyStr  ' 13 ASCII character
    REMOVE SPC$ from MyStr  ' 32 ASCII character
END SUB



' Ref:  https://bcxbasiccoders.com/webhelp/html/bcxsystemvariables.htm

' Ref:  http://www.tutorialspoint.com/c_standard_library/c_function_isspace.htm

airr

@MrB, your second cited reference (for the ispace() function) got me thinking, so I threw this together:


Function RemoveSpaces(MyStr as string, buff as string) as integer
    dim as int i, j = 0, length = len(MyStr)

    for integer i = 0 to length -1
        if not isspace(MyStr[i]) then
            buff[j] = MyStr[i] 
            j++         
        end if
    next

    return j
End Function


AIR.

MrBcx

Hello Armando,

My earlier code is for users who prefer or need an easy to understand procedure to tinker with.

Below is a string function that derives from your code.  I like this a lot!


PRINT NoWhiteSpace ("   This     is    a    test    ")

PAUSE

FUNCTION NoWhiteSpace (MyStr AS STRING) AS STRING
    DIM buff$ * LEN(MyStr)
    DIM AS INT i, j = 0, length = LEN(MyStr)
    FOR INTEGER i = 0 TO length -1
        IF NOT isspace(MyStr[i]) THEN
            buff[j] = MyStr[i]
            j++
        END IF
    NEXT
    FUNCTION = buff$
END FUNCTION


djsb

Maybe someday I'll understand how the code posted above works.

airr

using MrB's function, I pasted the code into ChatGPT and asked it to provide an explanation of the code.  Here is the result:

QuotePurpose of the Function

The NoWhiteSpace function is designed to take a string as input and return a new string that is identical to the input string but with all whitespace characters removed.
Steps Involved

    Declaration: The function is named NoWhiteSpace and it expects a single string argument. It returns a string as well.

    Variable Initialization:
        A buffer string (buff$) is created to hold the characters from the input string that are not whitespace. This buffer string is initialized to have the same length as the input string, although it will not necessarily use all of this space.
        Several integer variables are declared:
            i will serve as a loop counter to iterate through each character in the input string.
            j is initialized to 0 and will be used to keep track of the current position in the buffer string where the next non-whitespace character should be placed.
            length holds the length of the input string to determine how many characters need to be checked.

    Loop Through Characters:
        A loop runs from the first character to the last character of the input string.
        Inside this loop, each character of the input string is checked to see if it is a whitespace character.

    Check and Append Non-Whitespace Characters:
        If the character is not a whitespace character, it is added to the buffer string at the current position indicated by j.
        The position counter j is then incremented to prepare for the next non-whitespace character.

    Return the Result:
        After the loop has processed all characters in the input string, the buffer string (which now contains only the non-whitespace characters) is returned as the result of the function.

Summary

In summary, the NoWhiteSpace function processes an input string character by character, removing any whitespace characters and constructing a new string with only the non-whitespace characters. This new string is then returned as the output of the function.

AIR.

MrBcx

#7
There's more than one way to skin a cat.

The function below lets us remove all individual characters in the second argument from the first argument.


DIM MyString AS STRING

MyString  = "   This   is a test. "
PRINT MyString

MyString  = RemoveAny(MyString, CHR$(9, 10, 11, 12, 13, 32))  ' <<-- These 6 chars collectively make up "white-space"

PRINT "Trimmed string = ", MyString
PRINT "Length of the string = ", LEN(MyString )
PAUSE

FUNCTION RemoveAny (Target AS STRING, CharsToRemove AS STRING) AS STRING
    DIM AS PCHAR WritePtr
    DIM AS INTEGER TargetLen
    DIM AS STRING Result
    TargetLen = LEN(Target)
    WritePtr = Result
    FOR INT i = 1 TO TargetLen
        IF INSTR(CharsToRemove, MID$(Target, i, 1)) = 0 THEN
            *WritePtr = Target[i-1]
            INCR WritePtr
        END IF
    NEXT
    *WritePtr = 0   ' Null-terminate the result
    FUNCTION = Result$
END FUNCTION






MrBcx

I promise, this is the last one I'll upload today.  I just found it in my snippets library.



FUNCTION REMOVE_ANY$ (szMainStr AS LPCTSTR, szMatchStr AS LPCTSTR, Sensitivity = TRUE)
    DIM RAW AS INT nLen = LEN(szMainStr) + 1
    DIM szStr$ * nLen
    szStr$ = szMainStr$
    IF Sensitivity = TRUE THEN
        FOR LONG i = 1 TO nLen
            szStr$ = REMOVE$(szStr$, MID$(szMatchStr$, i, 1))
        NEXT
    ELSE
        FOR LONG i =  1 TO nLen
            szStr$ = IREMOVE$(szStr$, MID$(szMatchStr$, i, 1))
        NEXT
    END IF
    FUNCTION = szStr$
END FUNCTION

'*********************************************************************************************
'                                     E X A M P L E
'*********************************************************************************************
DIM a$
a$ = "abcABCdefghi1234567890"
PRINT a$                          ' Before any removals      results:  abcABCdefghi1234567890
PRINT
a$ = REMOVE_ANY$(a$, "abc456")    ' Case-sensitive removal   results:  ABCdefghi1237890
PRINT a$
PRINT
a$ = REMOVE_ANY$(a$, "abc456", 0) ' Case-insensitive removal results:  defghi1237890
PRINT a$
PAUSE


airr

Okay, I slightly refactored my version using isspace() to use a while loop, while also providing comments (courtesy of ChatGPT - too lazy to do that part myself!):


' Declare the function RemoveSpaces that takes a string (MyStr) and a buffer string (buff) as inputs and returns an integer
Function RemoveSpaces(MyStr as string, buff as string) as integer

    ' Declare integer variables i, j, and length. Initialize length to the length of MyStr.
    ' i and j will be used as counters, but are not explicitly initialized here, defaulting to 0.
    dim as int i, j, length = len(MyStr)
   
    ' Start a while loop that runs as long as i is less than length of MyStr
    while i < length

        ' Check if the current character in MyStr (at index i) is not a whitespace character.
        ' If it's not a whitespace, assign it to the buff string at index j and then increment j.
        if not isspace(MyStr[i]) then buff[j++] = MyStr[i]

        ' Increment the counter i by 1 to move to the next character in MyStr
        incr i

    ' End of the while loop
    wend

    ' Return the value of j, which represents the length of the new string in buff without spaces
    return j

' End of the function
End Function


AIR.