Making a distributed computing network in python

Question

so I have a huge amount of data to process, to do it I'm using everything I can get, my parents computers, my girlfriend computer, my computers, my brothers computers.

They are ok with lending me some of their processing power, and the processing programs only uses 1 of the 4 cores of their computer. I'll set up something that will launch the slaves at their computer startup

I coded this "distributed computing program" by myself, I just learned about sockets with google and I want to make sure that I'm not making a big mistake

From what I understand a socket is one way only, A can only send data to B, if B needs to send data to A then an other socket on an other port need to be opened.

the "distributor" is the program that orchestrates the computing, it sends data to crunch to all the slaves, it is running on a cheap dedicated server
the "slaves" ask data from the distributor and compute stuff, store the result, then ask for more data to crunch

the "distributor" has a registration_port_distributor : 15555
the "slaves" have a registration_port_slave : 14444 (yes the same for each slaves)
work_port = registration_port_distributor + 1

the distributer boots
start of the loop
    wait for a slave connection
    a slave connect to port 15555 (registration_port_distributor) and tell the distributor "I am 'slave_name', give me 2 ports to work on my port 14444 (registration_port_slave)"
    the disbtributor connect to the slave on port 'registration_port_slave' and give it "work_port" (data_reception_port) for receiving data and work_port+1 (data_request_port) so that the slave can request new data to crunch
    work_port is incremented by 2

from this point a slave can receive data to process from a connection on 'data_reception_port' and it can ask for new data to crunch from a connection on 'data_request_port'

The only problem I can see here is if 2 slaves try to connect at the same time, but that is easily fixed using a while loop on each slave with a 5 second sleep for reattempting a connection.

What do you think?

Thanks.

ps : yes the slaves do not send back the result, I will get them manually, or implement that later.

pps : will be uploaded to my github later, the code is a mess right now, I am testing various things.

a tcp socket can be used for communicating in both directions or only a single direction(depending on the options you provide when creating it) for a single connection — tkhurana96
– tkhurana96, Commented Feb 13, 2018 at 13:29
Oh allright, I'm going to re-read the documenation and adapt the code then — sliders_alpha
– sliders_alpha, Commented Feb 13, 2018 at 13:37
From what I understand a socket is one way only. Wrong. By default a connected (TCP) socket is bi-directional. — Serge Ballesta
– Serge Ballesta, Commented Feb 13, 2018 at 13:38
The only problem I can see here is if 2 slaves try to connect at the same time. That is what listen - accept is made for. As long as the number of simultaneous connection request is not greater that listen backlog, they are gently queued and then processed one at a time with accept. — Serge Ballesta
– Serge Ballesta, Commented Feb 13, 2018 at 13:43

Am_I_Helpful · Accepted Answer · 2018-02-13 19:25:15Z

1

From what I understand a socket is one way only, A can only send data to B, if B needs to send data to A then an other socket on an other port need to be opened.

As already mentioned by several people in the comments, a TCP socket is bi-directional, and you can use the same for two way communication. The application has to be coded in such a way that both side understand each other.

from this point a slave can receive data to process from a connection on 'data_reception_port' and it can ask for new data to crunch from a connection on 'data_request_port'

Once you change your application model to the way as explained above, you'd no longer require to communicate using two separate ports/connections each side.

The only problem I can see here is if 2 slaves try to connect at the same time, but that is easily fixed using a while loop on each slave with a 5 second sleep for reattempting a connection.

Please read about the backlog in Socket communications. If the count of incoming requests are more than which can be served at the moment, the requests would be queued (the exact number of requests waiting in the queue depends on the backlog parameter). Check documentation of socket.listen([backlog]) function for more information.

I hope this answers your questions. Please feel free to query further in case of any confusion.

answered Feb 13, 2018 at 19:25

Am_I_Helpful

19.2k7 gold badges53 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

sliders_alpha Over a year ago

Thanks, I did not saw that in the tutorial :)

Collectives™ on Stack Overflow

Making a distributed computing network in python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related