gnutella protocol
I've started work on a program which utilizes the gnutella protocol (as
of version 0.48). Basically, you connect a SOCK_STREAM (tcp) socket to any
other gnutella server, send a GNUTELLA CONNECT/0.4[lf][lf]
and expect back a GNUTELLA OK[lf][lf] . At this point the
server expects you to identify yourself. You send a type 0x00 message to
whoever you just connected to, and the server responds with how many files
it is sharing, and the total size of those files (in KB). You'll also get
a response from everybody connected to the machine you connect to, and so
on, until the TTL expires on the message.
At this point, the server will start bombarding you with information
about other servers (which fills the host catcher, and gnutellanet stats).
You'll also get search requests. You're supposed to decrement the TTL and
pass it on to any other servers you're connected to (if TTL > 0) If you
have no matching files you can simply discard the packet, otherwise you
should build a query response to that message and send it back from where
it came.
The header is fixed for all message and ends with the size of the data
area which follows. The header contains a Microsoft GUID (Globally Unique
Identifier for you nonWinblows people) which is the message identifer. My
crystal ball reports that "the GUIDs only have to be unique on the
client", which means that you can really put anything here, as long as
you keep track of it (a client won't respond to you if it sees the same
message id again). If you're responding to a message, be sure you haven't
seen the message id (from that host) before, copy their message ID
into your response and send it on it's way. That message ID is followed by
a function ID (one byte), which looks to be a bitmask. The function ID
indicates what type to do with the packet (search request, search
response, server info, etc). The next field is a byte TTL. Every packet
you recieve you should dec (or -- for the C guys) the TTL and pass the
packet on if the TTL is still > 0 (i.e. if (--hdr.TTL) { [pass on] },
god I love C). You should also inc the hop count. Seems redundant? Well,
some people have smaller TTLs, and you have the right to drop any message
you want to based on its hop count. The header finishes up by telling us
how large the function-dependant data that follows is.
Searches: Easy, just build a type 0x80 packet, add a WORD for
the minimum connection speed (in kbps), then the null terminated string.
There isn't a response from people who have no match, but a result will
come back as a type 0x81 message. There will be a
gnutella_query_response_hdr followed by N gnutella_query_response_rec_hdr
and double NULL terminated filenames. To finish this up, there's a
gnutella_query_response_ftr with the full 128 bit (16 byte) client ID of
the server that found the result.
Downloads and Uploads:These are POC. If you want a file from a
server, you connect to the server, and send an HTTP request for it. The
URL is of the form /get/[file_id]/[filename] . The file id was
returned with the search result. The gnutella HTTP server also supports
resuming a transfer via the Content-range: HTTP header. If
you're just curious, the User-Agent is gnutella. You can actually load up
Netscape, and get a file from a Gutella server. Pretty cool, eh? Here's a
dump of what a HTTP request looks like:
GET /get/293/rhubarb_pie.rcp HTTP/1.0 User-Agent:
gnutella
Yes, the user-agent header and HTTP version
are required. If the server is behind a firewall which does not allow
incoming connections, the client can negotiate a push connection. This is
a function ID 0x40 packet. It contains the ClientID128 (GUID) of the
server, followed by the File ID requested, and the IP address and port of
the client.
Ok, you all are wondering, sure the verbal theory is great, but why not
just give us the code! This is my source code: Here's what I
got so far. I know, it's in pascal, but what the heck, I'm a
Delphi programer 18 hours a day (and a beer drinker the rest). This is a
component you can add to your palette which handles connecting to
gnutellanet simply drop on your form, link up some event handlers, and
call ConnectNewHost to get started. BeginNewSearch() is also something you
might be interested in.
So, what have I got running? I've got a client which will connect to
multiple servers, host catch, do search requests (and get the results). Of
course the search monitor works too. Here are some
screenshots. It's written in Delphi, and the code is up above.
Here's a
sample log file which shows the protocol in action.
Awlright, some people have read this document and sill keep asking me
things like "What's in the first 10 bytes of the header?", so for
people who can't figure out how what the syntax of a Delphi record looks
like, or can't read english so well, here's some nice tables:
gnutella_header
Byte Pos |
Name |
Notes |
0 - 15 |
Message ID |
A message ID is generated on the client for each new
message it creates. The 16 byte value is created with the Windows
API call CoCreateGUID(), which in theory will generate a new
globally unique value every time you call it. See the text above for
a comment about this values uniqueness |
16 |
Function ID |
What message type the packet is. See the table of message types
below for descriptions of the types |
17 |
TTL Remaining |
How many hops the packet has left before it should be
dropped |
18 |
Hops taken |
How many hops this packet has already taken. Set the TTL on
response messages to this value! |
19 - 22 |
Data Length |
The length of the Function-dependant data which follows. There
has been some discussion as to if this value is actually only 2
bytes and the last 2 bytes are something else. Seems to work with 4
for me. Also there is a question as to signed or unsigned integers.
Don't know that either, I can't get gnutella to try and send a 2^31
+ 1 byte packet :) |
Function IDs
0x00 |
Ping - An empty message (datalen = 0) sent by a client
requesting an 0x01 from everyone on the network. This message type
should be responded to with a 0x01 and passed on. |
0x01 |
Ping response - Sent in response to a 0x00, this message
contains the host ip and port, how many files the host is sharing
and their total size. |
0x40 |
Client push request - For serverants behind a firewall,
where the client cannot reach the server directly, a push request
message is sent, asking the server to connect out to the client and
perform an upload |
0x80 |
Search - This is a search message and contains the query
string as well as the minimum speed. |
0x81 |
Search Results - These are results of a 0x80 search
request It contains the IP address, port, and speed of the
serverant, followed by a list of file sizes and names, and the
ClientID128 of the serverant which found the files. ClientID128 is
another 16 byte GUID. However, this GUID was created once when the
client was installed, is stored in the gnutella.ini, and never
changes. |
gnutella_ping_response
Byte Pos |
Name |
Notes |
23 - 24 |
Host port |
The TCP port number of the listening host |
25 - 28 |
Host IP |
The IP addres of the listening host, in network byte
order. |
29 - 32 |
File Count |
An integer value indicating the number of files shared by the
host. No idea if this is a signed or unsigned value. |
33 - 36 |
Files Total Size |
An integer value indicating the total size of files shared by
the host, in kilobytes (KB). No idea if this is a signed or unsigned
value. |
gnutella_query_hdr
Byte Pos |
Name |
Notes |
23 - 24 |
Minimum speed |
The minimum speed of serverants which should perform the search.
This is entered my the user in the "Minimum connection speed" edit
box. |
25 + |
Search query |
A NULL terminated character string wich contains the search
request |
gnutella_query_response_hdr
Byte Pos |
Name |
Notes |
23 |
Num recs |
Number of gnutella_query_response_recs which follow this
header |
24 - 25 |
Host port |
The listening port number of the host which found the
results |
26 - 29 |
Host IP |
The ip address of the host which found the results. In network
byte order. |
30 - 33 |
Host Speed |
The speed of the host which found the results. This may be
incorrect. I would assume that only 2 bytes would be needed for
this. The last 2 bytes may be used to indicate something
else |
34 + |
Array of gnutella_query_response_recs |
A gnutella_query_response_recs for each result found |
Last 16 bytes |
gnutella_query_response_ftr |
The clientID128 of the host which found the results. This value
is stored in the gnutella.ini and is a GUID created with
CoCreateGUID() the first time gnutella is
started. |
gnutella_query_response_rec
Byte Pos |
Name |
Notes |
+0 offset from start of rec |
File Index |
Each file indexed on the server has an integer value associated
with it. When gnutella scans the hard drive on the server a
sequential number is given to each file as it is found. This is the
file index. |
+4 offset from start of rec |
File Size |
The size of the file (in bytes). |
+8 offset from start of rec |
File Name |
The name of the file found. No path information is sent, just
the file's name. The filename field is double-NULL
terminated. |
gnutella_push_req
Byte Pos |
Name |
Notes |
23 - 38 |
ClientID128 |
The ClientID128 GUID of the server the client wishes the push
from. |
39 - 42 |
File Index |
Index of file requested. See query_response_rec for more
info |
43 - 46 |
Requester IP |
IP Address of the host requesting the push. Network byte
order |
47 - 48 |
Requester Port |
Port number of the host requesting the push. |
Routing
An issue everyone wants to ask me about nowadays is routing. "Do I
forward every packet I see to every connected host?" Holy Jesus no! That
would swamp the network with duplicate packets (which it already is).
Here's the secret. For simplicity sake, TTL is not discussed in this
section
 (Forgive
the non-straight lines, but the internet's like that)
Imagine yourself as node 1 in the above diagram. You have direct
gnutellanet (physical socket) connections to nodes 2, 3, 4, and 5. You
have reachable hosts at nodes 6 thru 13.
- You get an ping message (function 0x00) from 2 with a message id of
x.
- Lookup in your message routing table
[message x, socket
???]
- Not there? Save
[message x, socket 2] in the list.
- Respond with an Ping Response (0x01), message id x to node 2.
- Send the function 0x00 message to nodes 3, 4, and 5 (not 2!!).
- Node 3 will respond with Ping Response (0x01), message id x.
- Forward the message to whoever in the list has
[message x,
socket ???] , since this packet is being routed and not broadcast,
there is no need to check for if it is a duplicate, as routed messages
don't make loops.
- Do the same thing with responses from 4 and 5.
- Since 3 thru 5 will also pass the message on to 8 thru 13, you'll
also get a 0x01 from them too.
- Problem:Node 3 is connected to 10 who is connected to 4 who
is connected to you! It's OK! You lookup in your route list
[message x, socket ???] ... It's already there! You drop the
message, do not respond to 4, do not forward to anyone!
Here's the basic mechanics, described in the example above:
- If the low bit of the function ID (f) is 0, look for
[message
x, socket ???] . If it's already there, drop the message. If it
isn't, add it as [message x, socket s] , respond to socket
s , and forward to all connected clients except
socket s (the one you got the message from).
- If the low bit of the function ID (f) is 1, look for the socket
which matches
[message x, socket ???] and forward the
message to that connection only.
- If the low bit if the function (f) is not 0 or 1, you need to stop
letting an inifite number of monkeys use your machine while they work on
their Hamlet script.
TTL and Hops
"How many computers the packet can go through before it
will stop being passed around like a whore" - Nouser (#gnutella on
efnet) TTL, anyone who knows anything about TCP/IP will
tell you that TTL stands for Time To Live. Basically, when a packet
(or message in our case) is sent out, it is stamped with a TTL, for each
host that receives the packet, they decrement the TTL. If the TTL is zero,
the packet is dropped, otherwise it is routed to the next host in the
route. Gnutella TTLs work similarly. When a NEW message is sent from your
host, the TTL is set to whatever you have set in your Config | TTL | My
TTL setting. When the packet is received by the next host in line the TTL
is decremented. Then that TTL is checked against that host's Config | TTL
| Max TTL setting. The lower of the two numbers in placed in the outgoing
TTL field. If the outgoing TTL is zero, the packet is dropped. [Capn's
Note: I'm not positive about this next part] Then the Hops field of
the message is incremented and checked. If this number is greater than the
Max TTL setting, the packet is dropped.[End Capn's Note] This
method means that even if you set your TTL to 255 (maximum value), odds
are the TTL will be set to the default (5) by the next host in your chain.
|