Discussion:
[Openswan Users] stuck in STATE_MAIN_I3/STATE_MAIN_R2
Oskar Liljeblad
2004-10-11 15:36:32 UTC
Permalink
I have a simple OpenS/WAN setup between two linux 2.6.7 boxes, both running
2.2.0-4 (debian). They both have public static IPs. I use X509 certificates
for initial authentication.

It appears that the box which initiates the VPN, alpha, gets stuck in state
STATE_MAIN_I3. The other end, beta, is in STATE_MAIN_R2:

beta #1: responding to Main Mode
beta #1: transition from state (null) to state STATE_MAIN_R1
beta #1: transition from state STATE_MAIN_R1 to state STATE_MAIN_R2
beta #1: max number of retransmissions (2) reached STATE_MAIN_R2

alpha #1: transition from state STATE_MAIN_I2 to state STATE_MAIN_I3
alpha #1: discarding duplicate packet; already STATE_MAIN_I3
alpha #1: discarding duplicate packet; already STATE_MAIN_I3
alpha #1: max number of retransmissions (2) reached STATE_MAIN_I3. Possible authentication failure: no acceptable response to our first encrypted message
alpha #1: starting keying attempt 2 of an unlimited number

and so it repeats.

With tcpdump I'm seeing these packets on alpha:

17:31:29.274391 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident
17:31:29.292350 IP beta.isakmp > alpha.isakmp: isakmp: phase 1 R ident
17:31:29.320873 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident
17:31:29.396066 IP beta.isakmp > alpha.isakmp: isakmp: phase 1 R ident
17:31:29.518108 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident[E]
[more packets come 10 seconds later]

and on beta:

17:31:29.282465 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident
17:31:29.283566 IP beta.isakmp > alpha.isakmp: isakmp: phase 1 R ident
17:31:29.328918 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident
17:31:29.384198 IP beta.isakmp > alpha.isakmp: isakmp: phase 1 R ident
[more packets come 20 seconds later]

The setup worked perfectly with 2.1.3, but recently I
upgraded to 2.2.0 and some time later it stopped working.
However, downgrading to 2.1.3 did not help (same issue).
I have no idea what else changed. I even reinstalled openswan
on both machines. Since it stopped working all of a sudden,
I'm starting to believe that one of the ISPs is blocking vital
VPN traffic. Or the kernel on one of the machines...?

Regards,

Oskar Liljeblad (***@osk.mine.nu)
Oskar Liljeblad
2004-10-11 17:03:28 UTC
Permalink
I enabled "control parsing" debugging and here's some output on the two
machines (alpha is initiator):

BETA:
"beta-alpha" #1: transition from state (null) to state STATE_MAIN_R1
| inserting event EVENT_RETRANSMIT, timeout in 10 seconds for #1
| next event EVENT_RETRANSMIT in 10 seconds for #1
|
| *received 244 bytes from 212.181.56.10:500 on eth0
| **parse ISAKMP Message:
| initiator cookie:
| 14 62 fd 15 da a7 af 7b
| responder cookie:
| 05 3e 7f 18 42 eb 4f d1
| next payload type: ISAKMP_NEXT_KE
| ISAKMP version: ISAKMP Version 1.0
| exchange type: ISAKMP_XCHG_IDPROT
| flags: none
| message ID: 00 00 00 00
| length: 244
| ICOOKIE: 14 62 fd 15 da a7 af 7b
| RCOOKIE: 05 3e 7f 18 42 eb 4f d1
| peer: d4 b5 38 0a
| state hash entry 7
| peer and cookies match on #1, provided msgid 00000000 vs 00000000
| state object #1 found, in STATE_MAIN_R1
| ***parse ISAKMP Key Exchange Payload:
| next payload type: ISAKMP_NEXT_NONCE
| length: 196
| ***parse ISAKMP Nonce Payload:
| next payload type: ISAKMP_NEXT_NONE
| length: 20
"beta-alpha" #1: transition from state STATE_MAIN_R1 to state STATE_MAIN_R2
| inserting event EVENT_RETRANSMIT, timeout in 10 seconds for #1
| next event EVENT_RETRANSMIT in 10 seconds for #1
[next event happens 10 seconds later]

======================================================================

ALPHA:
"beta-alpha" #1: transition from state STATE_MAIN_I1 to state STATE_MAIN_I2
| inserting event EVENT_RETRANSMIT, timeout in 10 seconds for #1
| next event EVENT_RETRANSMIT in 10 seconds for #1
|
| *received 372 bytes from 194.236.130.162:500 on eth0
| **parse ISAKMP Message:
| initiator cookie:
| 14 62 fd 15 da a7 af 7b
| responder cookie:
| 05 3e 7f 18 42 eb 4f d1
| next payload type: ISAKMP_NEXT_KE
| ISAKMP version: ISAKMP Version 1.0
| exchange type: ISAKMP_XCHG_IDPROT
| flags: none
| message ID: 00 00 00 00
| length: 372
| ICOOKIE: 14 62 fd 15 da a7 af 7b
| RCOOKIE: 05 3e 7f 18 42 eb 4f d1
| peer: c2 ec 82 a2
| state hash entry 6
| peer and cookies match on #1, provided msgid 00000000 vs 00000000
| state object #1 found, in STATE_MAIN_I2
| ***parse ISAKMP Key Exchange Payload:
| next payload type: ISAKMP_NEXT_NONCE
| length: 196
| ***parse ISAKMP Nonce Payload:
| next payload type: ISAKMP_NEXT_CR
| length: 20
| ***parse ISAKMP Certificate RequestPayload:
| next payload type: ISAKMP_NEXT_NONE
| length: 128
| cert type: CERT_X509_SIGNATURE
| CR 30 79 31 0b 30 09 06 03 55 04 06 13 02 53 45 31
| 0e 30 0c 06 03 55 04 08 13 05 53 6b 61 6e 65 31
| 0e 30 0c 06 03 55 04 07 13 05 4d 61 6c 6d 6f 31
| 18 30 16 06 03 55 04 0a 13 0f 4f 73 6b 61 72 20
| 4c 69 6c 6a 65 62 6c 61 64 31 0e 30 0c 06 03 55
| 04 03 13 05 61 6c 70 68 61 31 20 30 1e 06 09 2a
| 86 48 86 f7 0d 01 09 01 16 11 6f 73 6b 61 72 40
| 6f 73 6b 2e 6d 69 6e 65 2e 6e 75
| requested CA: 'C=SE, ST=Skane, L=Malmo, O=Oskar Liljeblad, CN=alpha,
E=***@osk.mine.nu'
| thinking about whether to send my certificate:
| I have RSA key: OAKLEY_RSA_SIG cert.type: CERT_X509_SIGNATURE
| sendcert: CERT_ALWAYSSEND and I did get a certificate request
| so send cert.
| I am sending a certificate request
"beta-alpha" #1: I am sending my cert
"beta-alpha" #1: I am sending a certificate request
| looking for secret for C=SE, ST=Skane, L=Malmo, O=Oskar Liljeblad,
CN=alpha, E=***@osk.mine.nu->C=SE, ST=Skane, L=Malmo, O=Oskar
Liljeblad, CN=beta, E=***@osk.mine.nu of kind PPK_RSA
| searching for certificate PPK_RSA:AwEAAdcIO vs PPK_RSA:AwEAAdcIO
| signing hash with RSA Key *AwEAAdcIO
"beta-alpha" #1: transition from state STATE_MAIN_I2 to state STATE_MAIN_I3
| inserting event EVENT_RETRANSMIT, timeout in 10 seconds for #1
| next event EVENT_RETRANSMIT in 10 seconds for #1
|
| *received kernel message
| initiate on demand from 212.181.56.10:0 to 194.236.130.162:0 proto=0
state: fos_start because: acquire
| find_connection: looking for policy for connection: 212.181.56.10:0/0 ->
194.236.130.162:0/0
| find_connection: conn "beta-alpha" has compatible peers:
212.181.56.10/32 -> 194.236.130.162/32 [pri: 16842765]
| find_connection: concluding with "beta-alpha" [pri:16842765]{0x80eddc0}
kind=CK_PERMANENT
| assign hold, routing was prospective erouted, needs to be erouted HOLD
| delete bare shunt: null pointer
| Queuing pending Quick Mode with 194.236.130.162 "beta-alpha"
| next event EVENT_RETRANSMIT in 10 seconds for #1
[next event happens 9 seconds later]

Regards,

Oskar Liljeblad (***@osk.mine.nu)
Herbert Xu
2004-10-12 02:08:47 UTC
Permalink
Post by Oskar Liljeblad
17:31:29.274391 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident
17:31:29.292350 IP beta.isakmp > alpha.isakmp: isakmp: phase 1 R ident
17:31:29.320873 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident
17:31:29.396066 IP beta.isakmp > alpha.isakmp: isakmp: phase 1 R ident
17:31:29.518108 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident[E]
The last packet never shows up on beta. This is clearly a network
problem. Check the packet size with tcpdump -v and see if there is
an MTU issue on that path.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Oskar Liljeblad
2004-10-12 05:34:36 UTC
Permalink
Post by Herbert Xu
Post by Oskar Liljeblad
17:31:29.274391 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident
17:31:29.292350 IP beta.isakmp > alpha.isakmp: isakmp: phase 1 R ident
17:31:29.320873 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident
17:31:29.396066 IP beta.isakmp > alpha.isakmp: isakmp: phase 1 R ident
17:31:29.518108 IP alpha.isakmp > beta.isakmp: isakmp: phase 1 I ident[E]
The last packet never shows up on beta. This is clearly a network
problem. Check the packet size with tcpdump -v and see if there is
an MTU issue on that path.
The last packet among the five packets above is:

07:28:05.646761 IP (tos 0x0, ttl 64, id 764, offset 0, flags [+], length:
1500) alpha.isakmp > beta.isakmp: isakmp 1.0 msgid : phase 1 I ident[E]:
[encrypted id] (len mismatch: isakmp 1652/ip 1472)

If I make beta initiator, this is the last packet:

07:33:38.791543 IP (tos 0x0, ttl 64, id 7198, offset 0, flags [+], length:
1500) beta.isakmp > erikslust-56-10.ip-pluggen.com.isakmp: isakmp 1.0 msgid
: phase 1 ? ident[E]: [encrypted id] (len mismatch: isakmp 1644/ip 1472)

So it's an MTU issue? MTU on the internet-interface is 1500 on both
machines.

Regards,

Oskar Liljeblad (***@osk.mine.nu)
Herbert Xu
2004-10-12 09:51:41 UTC
Permalink
Post by Oskar Liljeblad
[encrypted id] (len mismatch: isakmp 1652/ip 1472)
Unless your tcpdump command is broken, this looks like an openswan
problem. Please find the debug message from pluto about this packet.
It should tell us where the problem is.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Oskar Liljeblad
2004-10-12 15:02:16 UTC
Permalink
Post by Herbert Xu
Unless your tcpdump command is broken, this looks like an openswan
problem. Please find the debug message from pluto about this packet.
It should tell us where the problem is.
I'm using tcpdump 3.8.3.

The pluto 'emitting' debug for the tcpdumped packet

IP (tos 0x0, ttl 64, id 4160, offset 0, flags [+], length: 1500) alpha.isakmp > beta.isakmp: isakmp 1.0 msgid : phase 1 I ident[E]: [encrypted id] (len mismatch: isakmp 1652/ip 1472)

is this:

"beta-alpha" #1: I am sending a certificate request
| ***emit ISAKMP Certificate RequestPayload:
| next payload type: ISAKMP_NEXT_SIG
| cert type: CERT_X509_SIGNATURE
| emitting 122 raw bytes of CA into ISAKMP Certificate RequestPayload
| CA 30 78 31 0b 30 09 06 03 55 04 06 13 02 53 45 31
| 0e 30 0c 06 03 55 04 08 13 05 53 6b 61 6e 65 31
| 0e 30 0c 06 03 55 04 07 13 05 4d 61 6c 6d 6f 31
| 18 30 16 06 03 55 04 0a 13 0f 4f 73 6b 61 72 20
| 4c 69 6c 6a 65 62 6c 61 64 31 0d 30 0b 06 03 55
| 04 03 13 04 62 65 74 61 31 20 30 1e 06 09 2a 86
| 48 86 f7 0d 01 09 01 16 11 6f 73 6b 61 72 40 6f
| 73 6b 2e 6d 69 6e 65 2e 6e 75
| emitting length of ISAKMP Certificate RequestPayload: 127
| ***emit ISAKMP Signature Payload:
| next payload type: ISAKMP_NEXT_NONE
| emitting 256 raw bytes of SIG_I into ISAKMP Signature Payload
| SIG_I 70 f9 22 dd 79 f0 2e 5c 4f b3 09 4d b5 60 56 de
| 90 6c bf b7 fa 29 e1 3b 5f 71 94 05 c3 c2 57 ed
| 30 50 ed 8e 0a f4 39 c2 45 7d 26 b1 07 25 cd 9f
| f5 d0 c0 38 09 49 d9 ac cd 05 72 38 6b 60 5b 84
| 9a 89 76 6b a1 f8 b0 2c de 24 8a aa 97 4d ae b4
| a4 6a d4 ea 7e 67 e9 ea 58 ae 46 02 44 aa 83 6b
| 5e 90 31 45 0c 7f 68 8c 64 c6 45 25 70 05 f0 5d
| f8 2f f7 39 18 ab 57 d4 65 f0 5e 5f a9 95 74 4b
| a4 2f be 7c 39 73 9c d6 3a b6 d6 80 e5 5e 5e 31
| dd 14 63 92 bf e3 d1 3c d2 50 3e f3 81 a7 d6 0e
| 63 20 39 9b 8b 9a 4e 70 89 aa d6 f1 55 82 80 4f
| a6 77 a3 e7 36 9c 4f 55 af 1f b4 32 b3 81 e8 af
| 47 a4 cb 77 66 88 12 6b f4 c7 58 4d 46 f8 57 cb
| f0 51 d5 5e bd 9a ea 2e 9b c2 d5 ab b1 61 db c3
| 2c 4f 09 08 ba 8d 51 80 1f 8d 0a dd aa 55 29 57
| 5b 5a ef ba ad 7c 36 2c e2 8f f6 26 65 a9 f3 e4
| emitting length of ISAKMP Signature Payload: 260
| emitting 5 zero bytes of encryption padding into ISAKMP Message
| emitting length of ISAKMP Message: 1652
"beta-alpha" #1: transition from state STATE_MAIN_I2 to state STATE_MAIN_I3

Could it be the ISP blocking this packet? To test this possibility I set up
openswan on a third system (on a third ISP).

alpha-beta fails as above
alpha-third fails as above
beta-third OK!

I double-checked configurations and disabled iptables firewalls on alpha,
beta and the third computer.

Regards,

Oskar Liljeblad (***@osk.mine.nu)
Oskar Liljeblad
2004-10-12 20:58:28 UTC
Permalink
Post by Herbert Xu
Unless your tcpdump command is broken, this looks like an openswan
problem. Please find the debug message from pluto about this packet.
It should tell us where the problem is.
As recommended by PolarWolf on IRC, I tried to reduce the x509 certificate
key length from 2048 to 1024 on both ends. Now the VPN is up and running
correctly... But where's the bug?

Regards,

Oskar Liljeblad (***@osk.mine.nu)
Herbert Xu
2004-10-12 21:23:59 UTC
Permalink
Post by Oskar Liljeblad
Could it be the ISP blocking this packet? To test this possibility I set up
openswan on a third system (on a third ISP).
alpha-beta fails as above
alpha-third fails as above
beta-third OK!
Does ping -s 1472 beta/third work from alpha?
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Oskar Liljeblad
2004-10-13 06:23:58 UTC
Permalink
Post by Herbert Xu
Post by Oskar Liljeblad
Could it be the ISP blocking this packet? To test this possibility I set up
openswan on a third system (on a third ISP).
alpha-beta fails as above
alpha-third fails as above
beta-third OK!
Does ping -s 1472 beta/third work from alpha?
On alpha, without VPN:
ping -s 1472 beta OK
ping -s 1473 beta no reply
ping -s 1472 third OK
ping -s 1473 third no reply

On alpha, with VPN to beta:
ping -s 1472 beta fails

When I try to find the maximum working packet size with ping with
VPN enable I get weird results: First ping works. Then next time
I try I might get
Herbert Xu
2004-10-13 08:18:27 UTC
Permalink
Post by Herbert Xu
Post by Oskar Liljeblad
[encrypted id] (len mismatch: isakmp 1652/ip 1472)
Unless your tcpdump command is broken, this looks like an openswan
Actually I misread your packet dump. The flag '+' above indicates that
this is a fragment. So it looks like something in the path can't deal
with fragments. This is confirmed by your ping results as 1473 is the
smallest ICMP payload where you start getting fragments with an MTU of
1500.

Do a traceroute from alpha to beta and ping -s 1473 each hop to discover
where the problem is.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Oskar Liljeblad
2004-10-13 09:20:07 UTC
Permalink
Post by Herbert Xu
Actually I misread your packet dump. The flag '+' above indicates that
this is a fragment. So it looks like something in the path can't deal
with fragments. This is confirmed by your ping results as 1473 is the
smallest ICMP payload where you start getting fragments with an MTU of
1500.
Do a traceroute from alpha to beta and ping -s 1473 each hop to discover
where the problem is.
Thanks Herbert. There are some weird hops in the middle of the
traceroute (hops with 192.168-adresses) that I cannot ping at all.
I think those are the root of the problem, since I can't ping -s 1473
anything beyond them.

Is it acceptable that your ISP cannot deal with fragmented packets?
Is there anything else besides big pings and IPsec that will break
because of this? (I tried hping2 and I know it doesn't deal with
fragmented UDP packets either. Isn't it strange that other traffic
works well otherwise?) The reason I want to know is so that
I have a good case to present for my ISP...

Regards,

Oskar Liljeblad (***@osk.mine.nu)
Herbert Xu
2004-10-13 09:55:25 UTC
Permalink
Post by Oskar Liljeblad
Is it acceptable that your ISP cannot deal with fragmented packets?
Whether it's acceptable or not depends on whether this is going to
dent their bottom line :)

99% of the Internet is TCP, which doesn't use fragments normally.
I'd guess that the most commonly broken application if you filter
out all fragments would be the various VPN solutions, IPsec/PPTP/etc.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Continue reading on narkive:
Loading...