Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

Dark GDK / C strings and null characters

Author
Message
Michael P
18
Years of Service
User Offline
Joined: 6th Mar 2006
Location: London (UK)
Posted: 11th Jul 2008 15:49
I am trying to do the following (in order)
1. Assign memory to a c string.
2. Copy a C++ string into this c string.

The problem is that the C++ string often has null characters in it. With c strings a null character indicates the end of the string which means that the c string is only part filled with the C++ string.

I'm doing this because I want to point a WSABUF to the C string; I don't think there is a way of doing this properly in a way that points to a C++ string.

This may not make much sense so I wrote some code to help explain. _AddInt adds an integer's bytes to the buffer; the idea being that later on this integer is taken out of the buffer. Small integers only use the first byte and leave 3 bytes of null characters:




Any idea of how I could get around this problem?
Lilith
16
Years of Service
User Offline
Joined: 12th Feb 2008
Location: Dallas, TX
Posted: 11th Jul 2008 18:40 Edited at: 11th Jul 2008 18:41
I've only spent a little time with WSABUF and it was a long time ago, so maybe I'm not understanding what direction the data is supposed to flow.

Let's establish some terminology first. There's some confusion in the use of the term string. From the C point of view, a string is considered to be a nul terminated array of characters. The same holds true in C++ but you also have the std::string class, which makes you have to ask what's being referred to. I figure you're trying to copy the content of sFormatted to your ptrCharArray. Are you saying that the string represented by sFormatted isn't nul terminated?

Lilith, Night Butterfly
I'm not a programmer but I play one in the office
Michael P
18
Years of Service
User Offline
Joined: 6th Mar 2006
Location: London (UK)
Posted: 11th Jul 2008 19:40
As you can see I'm no good with terminology, anyway I now know what to call them ...

Yes, the problem is that the standard string 'sFormatted' contains several null characters that are not supposed to be indicating the end of the string.

I was thinking about temporarily replacing the null characters with another character, but I'm not sure if there is another character that is not used to represent data (hope what I said there makes sense)..
Lilith
16
Years of Service
User Offline
Joined: 12th Feb 2008
Location: Dallas, TX
Posted: 11th Jul 2008 19:54
That's strange. There's no reason a string class object should have more than one null in it. That should represent the terminator. Now it is possible to cause this to happen through unnatural means. I've done it before by manipulating the string without concern for the fact that the string object keeps track of its length only through what it does via its own functions.

Can you give an example of what the string looks like with the noxious nulls embedded?

Lilith, Night Butterfly
I'm not a programmer but I play one in the office
Michael P
18
Years of Service
User Offline
Joined: 6th Mar 2006
Location: London (UK)
Posted: 11th Jul 2008 20:07 Edited at: 11th Jul 2008 20:10
Using the code in my original post putting a breakpoint at the end this is what sFormatted looks like:


What has happened is the _AddInt function has added 4 bytes of data belonging to an integer which is in this case '1'. The integer takes up elements [0][1][2][3] and [1][2][3] are null characters.

With small integers, it seems that not all of the bytes contain data; in this case only one bytes is used. The null characters are needed because at the other end when _GetInt is used, the receiving end must know the size of the data it is getting; thus the size is fixed at 4 bytes for integers regardless of whether it uses all of them or not. The bytes that are not used are set to null.

The reason for this is to create a way of formulating packets for networking; the idea is as follows:

At the sending end..
-Add integer to buffer
-Send buffer

Then at the receiving end..
-Receive buffer
-Get integer from buffer

The way I see it I have the following options:
-Replace the null characters with something else for sending, and then convert the c string to a std string and put the null characters back in when received.
-Allocate a different type of memory to the WSABUF structure (instead of C strings).
-Find a way to tell C strings that I don't want null terminating characters to indicate the end of the string.

I don't know if any of those are possible, or if they would work.
Lilith
16
Years of Service
User Offline
Joined: 12th Feb 2008
Location: Dallas, TX
Posted: 11th Jul 2008 20:16
Why aren't you building WSABUF the way it was intended?

Lilith, Night Butterfly
I'm not a programmer but I play one in the office
Michael P
18
Years of Service
User Offline
Joined: 6th Mar 2006
Location: London (UK)
Posted: 11th Jul 2008 20:33
I thought I was , what then is the way it was intended?
Lilith
16
Years of Service
User Offline
Joined: 12th Feb 2008
Location: Dallas, TX
Posted: 11th Jul 2008 20:54
I've only done this once a couple of years ago while trying to work out a test SMTP tester.





WSABUF is the definition of a struct that contains two members. You create an object of that struct type (theBuffer, in my example) and assign the members individually. Then you use whatever code requires that you pass the address of the WSABUF to it. You're bundling the information to pass to the function rather than passing the data individually.

I could be off here, but even though the nul terminates the string, winsock routines that pass/receive text data expect the string to end in a CR/LF combination. Or maybe it's just one or the other.

Lilith, Night Butterfly
I'm not a programmer but I play one in the office
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 11th Jul 2008 20:55
It's totally legitimate for a C++ string to contain null bytes - it's not a C string, so it doesn't need a terminator.

I've not used WSABUF before, but I have used arrays and C++ strings together before - this looks like a simple extension to that.

Anyway, building a WSABUF from a C++ string:


Building a C++ string from a WSABUF:


Lilith
16
Years of Service
User Offline
Joined: 12th Feb 2008
Location: Dallas, TX
Posted: 11th Jul 2008 21:02
Quote: "It's totally legitimate for a C++ string to contain null bytes - it's not a C string, so it doesn't need a terminator."


But should it not have a nul terminator for compatability with other related functions? At least the ones don't try to modify it?

Lilith, Night Butterfly
I'm not a programmer but I play one in the office
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 11th Jul 2008 21:09
Quote: "But should it not have a nul terminator for compatability with other related functions?"

That's what the c_str() member function is for. The 'null' issue is something that has to be dealt with on a case-by-case basis by the programmer.

Quote: "At least the ones don't try to modify it?"

Look at the declaration of data() and c_str(). They both pass pointers to const chars - you're not allowed to modify the contents directly via pointers as you could mess up the internals of the string object (such as extending it beyond its current safe storage capacity).

If you need to change the contents, then you should create a new string as in my example, or replace the contents using the strings own member functions - they 'understand' the string internals and can carry out the update safely.

Michael P
18
Years of Service
User Offline
Joined: 6th Mar 2006
Location: London (UK)
Posted: 11th Jul 2008 21:28
IanM, I tried a similar method to the one you have posted but the problem remains the same. Strcpy, memcpy and copying each character individually all stop prematurely when a NULL character is reached.
Lilith
16
Years of Service
User Offline
Joined: 12th Feb 2008
Location: Dallas, TX
Posted: 11th Jul 2008 21:32
But the string you were passing had nulls embedded by the function you were performing on it.

Lilith, Night Butterfly
I'm not a programmer but I play one in the office
Michael P
18
Years of Service
User Offline
Joined: 6th Mar 2006
Location: London (UK)
Posted: 11th Jul 2008 21:54
So there is absolutely no way to have a standard string with nulls embedded into it, transfered into a WSABUF structure?

I have a new plan of action to avoid this problem but still maintain the AddInt, GetInt part:

In the buffer, each integer has a byte before it that specifies how many bytes the integer is using; this way there are no null characters and things should be smashing

So, if 1 was added to the buffer it would look like this:
[0]: 1
[1]: 1

if we added 260 (which uses 2 bytes and in byte form is 4 and 1) to the buffer instead of 1 it would look like this:
[0]: 2
[1]: 4
[2]: 1

Could work..
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 11th Jul 2008 21:56 Edited at: 11th Jul 2008 21:57
@Michael,
I can understand strcpy stopping at a null - that was what is was designed to do.

The memcpy however will copy the number of characters/bytes you tell it to. If it didn't work then either 1) you told it the wrong number of bytes to copy, or 2) your check was faulty.

Try it out yourself:


Michael P
18
Years of Service
User Offline
Joined: 6th Mar 2006
Location: London (UK)
Posted: 11th Jul 2008 22:38
Ah!! both memcpy and strcpy do infact work!

While I was testing all this I was just using the debugger which showed this:


for the following code:


Anyway, this made me happy, especially the smiley face:



Thanks for all your help guys!
Lilith
16
Years of Service
User Offline
Joined: 12th Feb 2008
Location: Dallas, TX
Posted: 11th Jul 2008 22:59 Edited at: 11th Jul 2008 23:04
What happens if you replace



with

cout << buffer.buf << endl:

??

Lilith, Night Butterfly
I'm not a programmer but I play one in the office
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 11th Jul 2008 23:34
When you use the string object on cout, the whole string is displayed, even the null byte in the middle. When you use a char*, cout assumes it's a C string and stops at the null.

Lilith
16
Years of Service
User Offline
Joined: 12th Feb 2008
Location: Dallas, TX
Posted: 11th Jul 2008 23:41
But why is there a null byte in the middle? Do any of the WS send functions accept or know what to do with a null byte in the middle of the intended output?

Lilith, Night Butterfly
I'm not a programmer but I play one in the office
Michael P
18
Years of Service
User Offline
Joined: 6th Mar 2006
Location: London (UK)
Posted: 12th Jul 2008 00:35
I think (and hope) that they just ignore the null characters but I will test this next week some time.
Lilith
16
Years of Service
User Offline
Joined: 12th Feb 2008
Location: Dallas, TX
Posted: 12th Jul 2008 00:41
But what is the purpose of putting 260 as a binary at the beginning of an otherwise text string?

Lilith, Night Butterfly
I'm not a programmer but I play one in the office
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 12th Jul 2008 01:08
I think I may see where the confusion is ...

You are seeing the char* in WSABUF as a C string - it's not, or rather, that's not the way you should be seeing it. Instead, think of it as a pointer to fixed-sized buffer in memory. The size of the buffer isn't determined by a terminating null byte as it would be if it were a C string, but by the 'len' member of WSABUF.

As confirmation of that, MSDN defines the structure as follows: 'The WSABUF structure enables the creation or manipulation of a data buffer.' (Emphasis mine)

Remember that the only difference between a pointer to char and a C string is in how you look at and deal with it. From the POV of the compiler, there is no difference.

Perhaps in this instance MS should have defined the buf element as an LPVOID, to LPBYTE rather than char*.

Quote: "Do any of the WS send functions accept or know what to do with a null byte in the middle of the intended output?"

No, because the WSA functions are only dealing with data. They don't know (or need to know) whether the data is char, float, int or whatever - they just transfer the raw data in byte form. It's your responsibility to get data into and out of these buffers in a form you can use.

Lilith
16
Years of Service
User Offline
Joined: 12th Feb 2008
Location: Dallas, TX
Posted: 12th Jul 2008 07:01
It's always been my understanding that Internet traffic uses strictly 7-bit of the 8 bits in a byte, right? Strictly printable characters and a few control characters. Take a look at this example on MSDN

http://msdn.microsoft.com/en-us/library/ms741542(VS.85).aspx

Note the buffer length is set to 1K but the string being sent is maybe a dozen characters. The len variable is there to tell the function how much space is available in the buffer but the string doesn't have to fill that buffer. Binary data is transferred in a form that's encoded into text characters.

Lilith, Night Butterfly
I'm not a programmer but I play one in the office
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 12th Jul 2008 12:03
Quote: "It's always been my understanding that Internet traffic uses strictly 7-bit of the 8 bits in a byte, right?"

No, network traffic is always 8 bit. Applications may apply limits (email used to for example, hence uuencoding) but not the network, otherwise you would never be able to send binary data across the network without putting some sort of encoding around it.

It looks to me that the example is sending 1k bytes, but only populating a few at the front. I've also been looking at code the the CodeProject site, and the examples I've seen that use IOCP treat the WSABUF as data, not as strings.

It doesn't make sense to me that MS would implement the standard BSD sockets in binary, then cripple their high-performance additions by switching from a binary transmission to a C String only transmission.

Login to post a reply

Server time is: 2024-11-20 15:18:59
Your offset time is: 2024-11-20 15:18:59