Author Topic: Remove Duplicate Strings in Array  (Read 958 times)

Robert

  • Hero Member
  • *****
  • Posts: 1145
    • View Profile
Remove Duplicate Strings in Array
« on: May 10, 2021, 02:24:43 PM »
Use at your own risk!

Based on post from Todd at

https://stackoverflow.com/questions/35425255/remove-duplicates-from-array-of-strings-in-c

Code: [Select]

DIM ArrayElements, UniQCount, Int1
DIM AlphaBits[10000] AS CHAR PTR

FOR INTEGER i = 0 TO 9999
 AlphaBits[i] = CHR$(RND2(65, 90))
NEXT i

ArrayElements = UBOUND(AlphaBits)

UniQCount = UniQ(ArrayElements, AlphaBits)

?

PRINT " Of the ", ArrayElements + 1, " elements of the UniQ array,", _
UniQCount, " are unique."

?

Int1 = UniQCount - 1
PRINT " The UniQ elements are : ";
FOR INTEGER i = 0 TO Int1 - 1
 PRINT $AlphaBits[i], ", ";
NEXT i
PRINT $AlphaBits[Int1]

?

QSORT DYNAMIC AlphaBits$, UniQCount
PRINT " which, when sorted, are : ";
FOR INTEGER i = 0 TO Int1 - 1
 PRINT $AlphaBits[i], ", ";
NEXT i
PRINT $AlphaBits[Int1]

?

END

FUNCTION UniQ(elems, arr AS STRARRAY)
 DIM AS INTEGER i, j=1, k=1

 FOR i = 0 TO elems
   k = i + 1
  FOR j = i + 1 TO elems
   IF strcmp((char *)arr[i], (char *)arr[j]) THEN
    arr[k] = arr[j]
    k++
   END IF
  NEXT j
  elems -= j - k
 NEXT i

 FUNCTION = elems + 1

END FUNCTION


The "REMOVAL", really a reordering, is done in place which completely rearranges the array, essentially destroying it.

The strcmp casts are not necessary in a C compile.

If a case insensitive compare is needed, use stricmp or other case insensitive compare and accordingly adjust QSORT. An optional sensitivity flag could be added to the function and the code adjusted to support both cases of compare.

I have a feeling that, again, I may be reinventing the wheel with this demo.


MrBcx

  • Administrator
  • Hero Member
  • *****
  • Posts: 1897
    • View Profile
Re: Remove Duplicate Strings in Array
« Reply #1 on: May 10, 2021, 02:48:28 PM »
Quote
   I have a feeling that, again, I may be reinventing the wheel with this demo.

And those of us with wheel fetishes thank you for your generous contribution.

 

rexxitall

  • Newbie
  • *
  • Posts: 22
    • View Profile
Re: Remove Duplicate Strings in Array
« Reply #2 on: June 16, 2021, 05:27:14 AM »
Here dictionarys becomes handy. Pseudo code:
loop over the array and
test if a array element is existing in the dictionary
if not add the array element as key with its index as value to the dictionary
and add the array element to the new array.
if a array element exists already in the dictionary , it is double ones you do not do anything with it
This is pretty less code and quite fast.

Every thing else will require some kind of sorting first and or build some kind of lists. Otherwise you has this time consuming for next loop over the whole array to search for a duplicate key.

Robert

  • Hero Member
  • *****
  • Posts: 1145
    • View Profile
Re: Remove Duplicate Strings in Array
« Reply #3 on: January 29, 2022, 10:41:35 PM »
Inspired by Thomas's dictionary suggestion, the following code checks for duplicates against the array as the array is being filled.

O.K. with Pelles and MSVC.

Code: [Select]

DIM ArrayElements, ArrayLocation, UniQCount, Int1
DIM AlphaBits$[10000]
DIM TheBit$

TheBit$ = CHR$(RND2(65, 90))
AlphaBits$[0] = TheBit$

FOR INTEGER i = 1 TO 9999

 TheBit$ = CHR$(RND2(65, 90))

 FOR INTEGER ii = 0 TO ArrayLocation
  IF TheBit$ = AlphaBits$[ii] THEN
   GOTO Jump1
  END IF
 NEXT ii

 ArrayLocation++
 AlphaBits$[ArrayLocation] = TheBit$
 Jump1:

NEXT i

?
ArrayLocation++
PRINT " Of the 10000 elements generated, ", _
ArrayLocation, " are unique and were inserted in the array."

?

Int1 = ArrayLocation - 1
PRINT " The UniQ elements are : ";
FOR INTEGER i = 0 TO Int1 - 1
 PRINT $AlphaBits[i], ", ";
NEXT i
PRINT $AlphaBits[Int1]

?

QSORT AlphaBits$, ArrayLocation
PRINT " which, when sorted, are : ";
FOR INTEGER i = 0 TO Int1 - 1
 PRINT $AlphaBits[i], ", ";
NEXT i
PRINT $AlphaBits[Int1]

?

END