Wednesday Paramiko



Hi! That's my first blog post ever. Today at work we happened to have an issue I really enjoyed debugging, so I wanted to write about it.

With two other colleagues we have built and we keep maintaining an app which goal is to configure routers to send to our clients. As I work in a B2B ISP, the services we provide can be a lot of things from a single Internet access point to advanced MPLS tunneling. We have a bunch of templates for each services and devices. This app is commonly known as a ZTP solution: Zero-Touch Provisioning.

Technicians are told to physically plug our routers to racks dedicated to our ZTP app. One Ethernet port for IP connectivity, and a console port linked to a terminal server for each device.

Each rack thus contains an access switch and a terminal server where every devices port lands.

.-----------------. | | .---------. | switch | ----> | gateway | | terminal server | --------- | | And the switch is connected to a gateway! | #device1 | | #device2 | | #device3 | | #device4 | | ... | | | -----------------

The gateway is a simple debian virtual machine where we run an homemade DHCP server. New devices announce themself to the DHCP server that then makes a request to our backend, alerting the presence of a new device. The backend registers those entries and starts a state machine for each device starting by fetching some informations the DHCP server could not know by itself, like the serial number or the attached modules.

Someone told us they have plugged a device that is not appearing in our app's interface. Logs said

No matching kex algorithm found.

Well ... that was annoying. The device offered the following the following SSH key exchange algorithms: diffie-hellman-group-exchange-sha1 and diffie-hellman-group14-sha1.

Maybe the device was not up-to-date? Hopefully we still had the console access through the terminal server to check the version. Oh wait ... it actually is up-to-date. Why would I not be able to connect to this device while I still could connect to any other devices with the same version?

I can connect to the other devices, right?

Turns out we had the same error on every Cisco routers in our racks. That was even more annoying. It's not an isolated case anymore, we had to react quickly.

We tried to SSH to the devices from the gateway and we got the same error. We're on a regular and up-to-date debian 13 machine, don't we have the diffie-hellman kex? Turns on we have to explicitly enable them. Solving should be easy, just add the following to our SSH config file:

KexAlgorithms +diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha1 HostKeyAlgorithms +ssh-rsa

And yup, that works well, we're able to SSH to our Cisco devices back again. I suppose that the OpenSSH build for our debian 13 got updated when we made the upgrades after CopyFail ... and DirtyFrag ... and Fragnesia ... etc. And maybe this new build changes the default Kex. We just need to deploy our fix.

Our backend which starts the connection does not have direct access to the devices and rather proxy jumps through the gateway.

Cisco router <--> Gateway <--> App's backend

Thus we have to change the config in the backend, not the local SSH config on the gateway. But our app is deployed in a container so we should try from within a container to verify it's going to work properly.

_Host__________ Cisco router <--> Gateway <--> | App's backend | it's in a container! ---------------

Our backend machine's host is running the same debian 13 as the gateway. When we tried to proxy jump from the container using the correct SSH options we could indeed establish a connection to our devices.

Let's go ahead then, we just need to apply the patch to use the correct SSH options. Let's not mourne on that OpenSSH build.

One important thing: the backend is running Python, and we are using the netmiko library to connect to the devices. The cool thing is that we can just give a ssh_config file to netmiko's ConnectHandler and go with the flow.

We already were giving a ssh_config for proxy jumping anyway! Add the lines to git ... commit, push and let the CI/CD deploy that for us.

But ... it seems the problem is not solved yet. It still is not possible to connect to the devices. We get the exact same error: no match on our key exchange algorithm. We need to get deeper.


We were trying to launch a task to fetch the serial number of a Cisco ASR920. That initially was the one which got our attention. This kind of tasks is randomly handled by one of the twenty backend workers. And for debugging, we did something we could not have tell our boss about (he hopefully was in vacation). It was lunch time so we reduced the amount of workers down to one, basically breaking production, and started debugging directly from within the only worker's container.

docker exec, apt update, apt install vim, and open the lines where we placed the config and we started to try printing some more debugging infos. Everything seemed perfectly in place. We obviously had no issue with our code. It was about three hours of debugging, reading docs about SSH key exchanges, SSH configs, about how to give configs to netmiko and paramiko. We couldn't figure out the issue. We had to go deeper again.


Even though lunch time is quite long in France, we were blocking production for too long and we thus decided to stop there and bring back the twenty workers and spend some time later when we won't be annoying anybody. That said, a lot of things were still broken due to the original issue, so I kept looking for a solution. I brought up my favorite text editor and started diving inside netmiko's and paramiko's code. I especially looked for the SSH config handling part. It's now that I realized they are not using the system libraries, it's all rewritten here in Python, especially the ssh config parsing. So paramiko is responsible for that part. I mean, if we had perform an upgrade on our dependency to upgrade netmiko we probably would have expected something like this to happen, but we did not. We do not have a direct dependence to paramiko. It's referenced by netmiko.

What we did was deploying a new patch yesterday but it was totally unrelated to netmiko, paramiko or anything close to this part of the project. But still, it costed nothing to check the paramiko's releases notes.

[Support]: Removed support for key exchange using SHA-1, meaning the kex methods diffie-hellman-group-exchange-sha1, diffie-hellman-group14-sha1, and diffie-hellman-group1-sha1 are now gone. Implementing classes have been removed/merged/shuffled as required.

But we're not in the latest version, right? grep paramiko uv.lock ... Oh ... we actually are in the latest version 5.0.0 of paramiko. But our project tags the versions to upgrade only minor changes, given semantic version is respected. Have a minor version change in netmiko upgraded paramiko with a breaking change?

Latest netmiko release notes said:

Update poetry lock file to pin Paramiko below version 5 (paramiko5 caused significant breakage) in #3852
We surely are not using the latest version. How does the netmiko pyproject of our version looks like?

paramiko = ">=3.5.0"

Really? I mean ... so you're taking any version above 3.5.0 by default? That would have seem risky, wouldn't it have? Why not having a ~ instead of that scary > ?

We realized what happened. Yesterday's deployment triggered a new pipeline, re-build the project with an old version of netmiko which brought a new version of paramiko featuring quite some breaking changes. It was totally transparent for us.

Netmiko nows tags paramiko = ">=3.5.0,<5.0" which definitely is not as good as having paramiko = "~=3.5.0" but that would effectively bring paramiko down to version 4.

I was pretty confident so I simply updated our netmiko version to the its latest version and deployed the project to production. And well, it did fix the issue. We could connect to our Cisco routers just like before.

After three hours of live and dirty debugging, we ended up fixing the issue by simply upgrading a dependence.

To debian and OpenSSH: I'm sorry I blamed you for no reasons

To Cisco: please support more recent algorithms!

To netmiko: please be more careful with version tagging!

To answer paramiko's FAQ: We do not live in an ideal word. Those algorithms used to work for many years. We know now they are not secure enough so closing issues on those is quite fine. You definitely should not spend time on those and beside that, I admire the work you do. I just think that total support removal is too much of an impact for today's imperfect world.


Hope you enjoyed reading my adventures and I wish you a nice day!