Is it 32bit or 64bit?

Started by airr, May 20, 2024, 04:56:37 PM

Previous topic - Next topic

MrBcx

Armando -- We should write a book.

After deducting for business expenses, I'll bet we could each realize a net profit of -$ 67.43

;D

airr

Adding on to MrB's reply ;D, the default PellesC installation comes with the pope.exe program, which let's you see what he's referencing re structures etc.....

AIR.

MrBcx

Adding onto AIR's reply, it's important to understand that the List of Signatures is only the beginning.
Most binary files follow the signatures with various data structures that describe things about the
organization of the contents of the files. 

For example, all MS-Dos and Windows executables (.com, .exe, .dll, etc) contain information about
what, where, and how much data and code are located in the file. The structures that hold that info
can be quite complex. The operating system knows how to analyze and extract that information,
allocate and assign memory addresses, perform integrity checks, before finally turning it all over to
the CPU and the I/O components that "run" the program.


airr

#17
Quote from: dragon57 on September 24, 2024, 06:42:33 PM
Are those headers one way the Un*x/Lin*x 'file' command works? I hadn't thought much about how that worked, but 'file' was a script staple of mine years ago.

On a basic level, yes.  The 'file' command incorporates 'magic numbers' (aka 'File Signature') in a sort of lookup table to identify the file type.  It retrieves this value from an offset in the file you're checking.

For an idea of the types of files that can be queried, the following Wiki page has a long list:  List of file signatures

AIR.

MrBcx

Quote from: dragon57 on September 24, 2024, 06:42:33 PM
Are those headers one way the Un*x/Lin*x 'file' command works? I hadn't thought much about how that worked, but 'file' was a script staple of mine years ago.

As it pertains to binary files, headers typically contain unique metadata at the beginning of the file. 

dragon57

Are those headers one way the Un*x/Lin*x 'file' command works? I hadn't thought much about how that worked, but 'file' was a script staple of mine years ago.

MrBcx

Quote from: djsb on September 24, 2024, 08:32:34 AM
Do ALL non human-readable binary files have a header?
No.

Quote
Or does this only apply to executable and DLL files based on Windows/MSDOS?
No.  Innumerable filetypes have headers: zip, xlsx, docx, pdf, mp3, etc.

Quote
Is there any documentation on this anywhere?
I'll leave that research exercise to you.


MrBcx

That's a keeper.

It doesn't matter if one compiles this 32-bit or 64-bit, it detects both 32-bit and 64-bit PE files.


airr

A simpler version to query 32/64bit, works on Intel and ARM for exe and dll files.


Function main(argc as integer, argv as pchar ptr)
    If argc != 2 Then
        Print "Usage: ", appexename$, " <executable>"
        Return 1
    End If

    ' Open the File
    Open Command$(1) For Binary Input as fp

    ' Read the DOS Header
    Dim As IMAGE_DOS_HEADER dosHeader
    Get$ fp, &dosHeader, SizeOF(dosHeader)
    if dosHeader.e_magic != IMAGE_DOS_SIGNATURE Then
        print "Not a valid executable file."
        Close fp
        return 1
    end if

    ' Advance to the PE Header
    seek fp, dosHeader.e_lfanew

    ' Read the PE Header
    Dim As DWORD peSignature
    Get$ fp, &peSignature, SizeOF(peSignature)
    if peSignature != IMAGE_NT_SIGNATURE Then
        print "Not a valid PE file."
        Close fp
        Return 1
    end if


    ' Read the File Header
    Dim as IMAGE_FILE_HEADER fileHeader
    Get$ fp, &fileHeader, SizeOF(fileHeader)

    ' Check the Machine Type
    select case fileHeader.Machine
        case IMAGE_FILE_MACHINE_AMD64, IMAGE_FILE_MACHINE_ARM64
            print "The binary is 64-bit."
        case IMAGE_FILE_MACHINE_I386, IMAGE_FILE_MACHINE_ARM
            print "The binary is 32-bit."
        case else
            print "Unknown Architecture."
    end select

    ' Close the File
    Close fp

End Function


AIR.

Robert

#11
Quote from: MrBcx on May 24, 2024, 06:46:39 PM
This is what chatgpt detected:

Analyzing the hex codes for the WhatIsIt[] structure, there are some discernible patterns:

    Vendor and Family Codes: The hex values often include vendor-specific or architecture-specific codes:
        ARM architectures (IMAGE_FILE_MACHINE_ARM, IMAGE_FILE_MACHINE_ARM64, IMAGE_FILE_MACHINE_ARMNT, IMAGE_FILE_MACHINE_THUMB) use codes in the range of 0x1C0 to 0x1C4.
        MIPS architectures (IMAGE_FILE_MACHINE_MIPS16, IMAGE_FILE_MACHINE_MIPSFPU, IMAGE_FILE_MACHINE_MIPSFPU16, IMAGE_FILE_MACHINE_R4000, IMAGE_FILE_MACHINE_WCEMIPSV2) use codes starting from 0x166 to 0x466.
        PowerPC architectures (IMAGE_FILE_MACHINE_POWERPC, IMAGE_FILE_MACHINE_POWERPCFP) use codes 0x1F0 and 0x1F1.
        SH architectures (IMAGE_FILE_MACHINE_SH3, IMAGE_FILE_MACHINE_SH3DSP, IMAGE_FILE_MACHINE_SH4, IMAGE_FILE_MACHINE_SH5) use codes from 0x1A2 to 0x1A8.

    Bit Width Indicators:
        Architectures with different bit widths have hex codes reflecting these widths:
            Alpha (IMAGE_FILE_MACHINE_ALPHA, IMAGE_FILE_MACHINE_ALPHA64) and AXP64 use 0x184 and 0x284.
            ARM architectures: ARM uses 0x1C0, ARM64 uses 0xAA64.
            RISC-V architectures: RISC-V 32 uses 0x5032, RISC-V 64 uses 0x5064, and RISC-V 128 uses 0x5128.

    Vendor Specific Codes:
        Unique architectures such as LoongArch32 and LoongArch64 use codes 0x6232 and 0x6264, respectively.
        EFI byte code (IMAGE_FILE_MACHINE_EBC) uses 0xEBC.
        Mitsubishi M32R uses a distinct value 0x9041.

    Processor Types:
        Intel architectures: IMAGE_FILE_MACHINE_I386 uses 0x14C, IMAGE_FILE_MACHINE_IA64 uses 0x200.
        AMD64 (IMAGE_FILE_MACHINE_AMD64) uses 0x8664.

Here's a summarized view of some patterns:

    ARM: 0x1C0 - 0x1C4
    MIPS: 0x166, 0x266, 0x366, 0x466, 0x169
    PowerPC: 0x1F0, 0x1F1
    SH: 0x1A2, 0x1A3, 0x1A6, 0x1A8
    Alpha: 0x184, 0x284

While not all codes fit perfectly into a pattern (e.g., IMAGE_FILE_MACHINE_M32R with 0x9041), many of the codes show a pattern within specific architecture families or bit-width groupings.

Thanks for that, MrBCX.

The Pelles C and Microsoft winnt.h headers have IMAGE_FILE_MACHINE_XXXX lists that are different from that in the code I posted. Some items the same, others different, some not there.

I suspect that somewhere there is a comprehensive list of machines/codes.

The chatGPT response is impressive, but for me, from the very beginning of my exposure to A.I., the scene shown in Sarah Morgan's blog, link below, always popped into my head when A.I. is mentioned.

https://www.sarah-morgan.com/2014/09/22/the-man-behind-the-curtain-the-emotional-impact-of-new-technology/





MrBcx

This is what chatgpt detected:

Analyzing the hex codes for the WhatIsIt[] structure, there are some discernible patterns:

    Vendor and Family Codes: The hex values often include vendor-specific or architecture-specific codes:
        ARM architectures (IMAGE_FILE_MACHINE_ARM, IMAGE_FILE_MACHINE_ARM64, IMAGE_FILE_MACHINE_ARMNT, IMAGE_FILE_MACHINE_THUMB) use codes in the range of 0x1C0 to 0x1C4.
        MIPS architectures (IMAGE_FILE_MACHINE_MIPS16, IMAGE_FILE_MACHINE_MIPSFPU, IMAGE_FILE_MACHINE_MIPSFPU16, IMAGE_FILE_MACHINE_R4000, IMAGE_FILE_MACHINE_WCEMIPSV2) use codes starting from 0x166 to 0x466.
        PowerPC architectures (IMAGE_FILE_MACHINE_POWERPC, IMAGE_FILE_MACHINE_POWERPCFP) use codes 0x1F0 and 0x1F1.
        SH architectures (IMAGE_FILE_MACHINE_SH3, IMAGE_FILE_MACHINE_SH3DSP, IMAGE_FILE_MACHINE_SH4, IMAGE_FILE_MACHINE_SH5) use codes from 0x1A2 to 0x1A8.

    Bit Width Indicators:
        Architectures with different bit widths have hex codes reflecting these widths:
            Alpha (IMAGE_FILE_MACHINE_ALPHA, IMAGE_FILE_MACHINE_ALPHA64) and AXP64 use 0x184 and 0x284.
            ARM architectures: ARM uses 0x1C0, ARM64 uses 0xAA64.
            RISC-V architectures: RISC-V 32 uses 0x5032, RISC-V 64 uses 0x5064, and RISC-V 128 uses 0x5128.

    Vendor Specific Codes:
        Unique architectures such as LoongArch32 and LoongArch64 use codes 0x6232 and 0x6264, respectively.
        EFI byte code (IMAGE_FILE_MACHINE_EBC) uses 0xEBC.
        Mitsubishi M32R uses a distinct value 0x9041.

    Processor Types:
        Intel architectures: IMAGE_FILE_MACHINE_I386 uses 0x14C, IMAGE_FILE_MACHINE_IA64 uses 0x200.
        AMD64 (IMAGE_FILE_MACHINE_AMD64) uses 0x8664.

Here's a summarized view of some patterns:

    ARM: 0x1C0 - 0x1C4
    MIPS: 0x166, 0x266, 0x366, 0x466, 0x169
    PowerPC: 0x1F0, 0x1F1
    SH: 0x1A2, 0x1A3, 0x1A6, 0x1A8
    Alpha: 0x184, 0x284

While not all codes fit perfectly into a pattern (e.g., IMAGE_FILE_MACHINE_M32R with 0x9041), many of the codes show a pattern within specific architecture families or bit-width groupings.

Robert

Quote from: Vortex on May 24, 2024, 02:32:22 PM
Hi Robert,

Your code is the complete and correct academic version. I am safisfied with the three machine types for practical purposes. The problem is that MS didn't specify a simple and linear enumeration making things easier :

Instead of this :

"IMAGE_FILE_MACHINE_UNKNOWN",0x0,"The content of this field is assumed to be applicable to any machine type",
  "IMAGE_FILE_MACHINE_ALPHA",0x184,"Alpha AXP, 32-bit address space",
  "IMAGE_FILE_MACHINE_ALPHA64",0x284,"Alpha 64, 64-bit address space",


I would prefer this one :

"IMAGE_FILE_MACHINE_UNKNOWN",0x0,"The content of this field is assumed to be applicable to any machine type",
  "IMAGE_FILE_MACHINE_ALPHA",0x1,"Alpha AXP, 32-bit address space",
  "IMAGE_FILE_MACHINE_ALPHA64",0x2,"Alpha 64, 64-bit address space",
   .
   .
   .


The numerical values of the equates does not seem to have any internal order making difficult to suggest optimizations. Maybe, you could sort the machine types table and use a binary search function to determine the correct platform \ architecture.

Hi Vortex:

I was half joking but also half hoping that maybe you could see a pattern in the allocation of machine codes.

I'll play around with it for a bit, but I have my doubts that there is a regular pattern to the order.

It is interesting when a look is had at what is below the surface.

Vortex

#8
Hi Robert,

Your code is the complete and correct academic version. I am safisfied with the three machine types for practical purposes. The problem is that MS didn't specify a simple and linear enumeration making things easier :

Instead of this :

"IMAGE_FILE_MACHINE_UNKNOWN",0x0,"The content of this field is assumed to be applicable to any machine type",
  "IMAGE_FILE_MACHINE_ALPHA",0x184,"Alpha AXP, 32-bit address space",
  "IMAGE_FILE_MACHINE_ALPHA64",0x284,"Alpha 64, 64-bit address space",


I would prefer this one :

"IMAGE_FILE_MACHINE_UNKNOWN",0x0,"The content of this field is assumed to be applicable to any machine type",
  "IMAGE_FILE_MACHINE_ALPHA",0x1,"Alpha AXP, 32-bit address space",
  "IMAGE_FILE_MACHINE_ALPHA64",0x2,"Alpha 64, 64-bit address space",
   .
   .
   .


The numerical values of the equates does not seem to have any internal order making difficult to suggest optimizations. Maybe, you could sort the machine types table and use a binary search function to determine the correct platform \ architecture.

Robert

Quote from: Vortex on May 23, 2024, 02:19:55 PM
Hi Kevin and Robert,

Thanks for your kind words. It's possible to optimize the code by replacing the executable type equalizer with bitwise operations.

' It's strongly recommended to compile the code as 64-bit

DIM AS BYTE buffer[1024]
DIM p AS BYTE PTR
DIM machine AS WORD
DIM m AS WORD
DIM AS STRING mtype[7]

mtype[0] = "unknown"
mtype[1] = "32bit"
mtype[2] = "Intel Itanium"
mtype[6] = "64bit"

IF ARGC = 1 THEN

    PRINT "Usage : GetEXEtype64.exe filename.exe \ .dll"
    END

END IF

OPEN COMMAND$(1) FOR BINARY INPUT AS hFile

GET$ hFile, buffer, 1024

CLOSE hFile

IF ((PIMAGE_DOS_HEADER)buffer)->e_magic <> IMAGE_DOS_SIGNATURE THEN
    PRINT "The file does not contain a valid DOS header."
    END
END IF

p = (BYTE PTR)((BYTE PTR)buffer+((PIMAGE_DOS_HEADER)buffer)->e_lfanew)

IF ((PIMAGE_NT_HEADERS)p)->Signature <> IMAGE_NT_SIGNATURE THEN
    PRINT "The file does not contain a valid PE header"
    END
END IF

machine = ((PIMAGE_NT_HEADERS)p)->FileHeader.Machine

m = (machine BAND 0xFFF ) shr 8

$COMMENT

IMAGE_FILE_MACHINE_I386  =  0x014C 
IMAGE_FILE_MACHINE_AMD64 =  0x8664 
IMAGE_FILE_MACHINE_IA64  =  0x0200

Retrieve the first 3 digits of the equates and divide the results by 2^8
to obtain 1, 6 or 2

$COMMENT

PRINT "The executable is " & mtype[m]


Hi Vortex:

Maybe you could optimize the "WhatWroteThis.bas" code below with bitwise operations?



' It's strongly recommended to compile the code as 64-bit

DIM AS BYTE buffer[1024]
DIM p AS BYTE PTR
DIM machine AS WORD
DIM m AS WORD

TYPE MachineData
  MachineConstant AS STRING
  MachineValue AS INTEGER
  MachineDescription AS STRING
END TYPE

SET WhatIsIt[] AS MachineData
  "IMAGE_FILE_MACHINE_UNKNOWN",0x0,"The content of this field is assumed to be applicable to any machine type",
  "IMAGE_FILE_MACHINE_ALPHA",0x184,"Alpha AXP, 32-bit address space",
  "IMAGE_FILE_MACHINE_ALPHA64",0x284,"Alpha 64, 64-bit address space",
  "IMAGE_FILE_MACHINE_AM33",0x1d3,"Matsushita AM33",
  "IMAGE_FILE_MACHINE_AMD64",0x8664,"x64",
  "IMAGE_FILE_MACHINE_ARM",0x1c0,"ARM little endian",
  "IMAGE_FILE_MACHINE_ARM64",0xaa64,"ARM64 little endian",
  "IMAGE_FILE_MACHINE_ARMNT",0x1c4,"ARM Thumb-2 little endian",
  "IMAGE_FILE_MACHINE_AXP64",0x284,"AXP 64 (Same as Alpha 64)",
  "IMAGE_FILE_MACHINE_EBC",0xebc,"EFI byte code",
  "IMAGE_FILE_MACHINE_I386",0x14c,"Intel 386 or later processors and compatible processors",
  "IMAGE_FILE_MACHINE_IA64",0x200,"Intel Itanium processor family",
  "IMAGE_FILE_MACHINE_LOONGARCH32",0x6232,"LoongArch 32-bit processor family",
  "IMAGE_FILE_MACHINE_LOONGARCH64",0x6264,"LoongArch 64-bit processor family",
  "IMAGE_FILE_MACHINE_M32R",0x9041,"Mitsubishi M32R little endian",
  "IMAGE_FILE_MACHINE_MIPS16",0x266,"MIPS16",
  "IMAGE_FILE_MACHINE_MIPSFPU",0x366,"MIPS with FPU",
  "IMAGE_FILE_MACHINE_MIPSFPU16",0x466,"MIPS16 with FPU",
  "IMAGE_FILE_MACHINE_POWERPC",0x1f0,"Power PC little endian",
  "IMAGE_FILE_MACHINE_POWERPCFP",0x1f1,"Power PC with floating point support",
  "IMAGE_FILE_MACHINE_R4000",0x166,"MIPS little endian",
  "IMAGE_FILE_MACHINE_RISCV32",0x5032,"RISC-V 32-bit address space",
  "IMAGE_FILE_MACHINE_RISCV64",0x5064,"RISC-V 64-bit address space",
  "IMAGE_FILE_MACHINE_RISCV128",0x5128,"RISC-V 128-bit address space",
  "IMAGE_FILE_MACHINE_SH3",0x1a2,"Hitachi SH3",
  "IMAGE_FILE_MACHINE_SH3DSP",0x1a3,"Hitachi SH3 DSP",
  "IMAGE_FILE_MACHINE_SH4",0x1a6,"Hitachi SH4",
  "IMAGE_FILE_MACHINE_SH5",0x1a8,"Hitachi SH5",
  "IMAGE_FILE_MACHINE_THUMB",0x1c2,"Thumb",
  "IMAGE_FILE_MACHINE_WCEMIPSV2",0x169,"MIPS little-endian WCE v2",
END SET
DIM AS INTEGER WhatIsItArraySize
WhatIsItArraySize = UBOUND(WhatIsIt)

IF ARGC = 1 THEN

  PRINT "Usage : WhatWroteThis.exe filename.exe \ .dll"
  END

END IF

OPEN COMMAND$(1) FOR BINARY INPUT AS hFile

GET$ hFile, buffer, 1024

CLOSE hFile

IF ((PIMAGE_DOS_HEADER)buffer)->e_magic <> IMAGE_DOS_SIGNATURE THEN
  PRINT "The file does not contain a valid DOS header."
  END
END IF

p = (BYTE PTR)((BYTE PTR)buffer+((PIMAGE_DOS_HEADER)buffer)->e_lfanew)

IF ((PIMAGE_NT_HEADERS)p)->Signature <> IMAGE_NT_SIGNATURE THEN
  PRINT "The file does not contain a valid PE header"
  END
END IF

machine = ((PIMAGE_NT_HEADERS)p)->FileHeader.Machine

FOR INTEGER ALoop = 0 TO WhatIsItArraySize
  IF machine = WhatIsIt[ALoop].MachineValue THEN
    PRINT "The file was created with a ", WhatIsIt[ALoop].MachineDescription$
  END IF
NEXT


Vortex

Hi Kevin and Robert,

Thanks for your kind words. It's possible to optimize the code by replacing the executable type equalizer with bitwise operations.

' It's strongly recommended to compile the code as 64-bit

DIM AS BYTE buffer[1024]
DIM p AS BYTE PTR
DIM machine AS WORD
DIM m AS WORD
DIM AS STRING mtype[7]

mtype[0] = "unknown"
mtype[1] = "32bit"
mtype[2] = "Intel Itanium"
mtype[6] = "64bit"

IF ARGC = 1 THEN

    PRINT "Usage : GetEXEtype64.exe filename.exe \ .dll"
    END

END IF

OPEN COMMAND$(1) FOR BINARY INPUT AS hFile

GET$ hFile, buffer, 1024

CLOSE hFile

IF ((PIMAGE_DOS_HEADER)buffer)->e_magic <> IMAGE_DOS_SIGNATURE THEN
    PRINT "The file does not contain a valid DOS header."
    END
END IF

p = (BYTE PTR)((BYTE PTR)buffer+((PIMAGE_DOS_HEADER)buffer)->e_lfanew)

IF ((PIMAGE_NT_HEADERS)p)->Signature <> IMAGE_NT_SIGNATURE THEN
    PRINT "The file does not contain a valid PE header"
    END
END IF

machine = ((PIMAGE_NT_HEADERS)p)->FileHeader.Machine

m = (machine BAND 0xFFF ) shr 8

$COMMENT

IMAGE_FILE_MACHINE_I386  =  0x014C 
IMAGE_FILE_MACHINE_AMD64 =  0x8664 
IMAGE_FILE_MACHINE_IA64  =  0x0200

Retrieve the first 3 digits of the equates and divide the results by 2^8
to obtain 1, 6 or 2

$COMMENT

PRINT "The executable is " & mtype[m]