ofEvents - Multiple registrations = disaster?

Hi,

I’ve been testing some android life cycle stuff on latest master and I bumped into a reproducible crash caused when notifying an event:

ofNotifyEvent(ofxAndroidEvents().reloadGL);

I investigated further and I saw that it’s not the recipient’s code that fails, but the event notification code failing when trying to call some specific recipient.

I eventually fixed the crash in ofTrueTypeFont.cpp, by changing the event registration to be in the constructor instead of the “load” method, so that it will only get registered once.

Theoretically, there shouldn’t be a problem with registering the same handler multiple times, right?

ofEvents undergone some serious changes lately, hasn’t it? I seem to recall that it was based on Poco events, but now it seems to be a proprietary solution with c++11 std functions. Maybe there’s a bug lurking somewhere in there?

Tal

can you post the stack trace? and perhaps open an issue in github better

Stack trace is easy, but not very revealing:

10-05 16:00:56.847: I/DEBUG(3123): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
10-05 16:00:56.847: I/DEBUG(3123): Build fingerprint: 'htc/htc_europe/htc_m8:5.0.1/LRX22C/448934.10:user/release-keys'
10-05 16:00:56.847: I/DEBUG(3123): Revision: '0'
10-05 16:00:56.847: I/DEBUG(3123): ABI: 'arm'
10-05 16:00:56.847: I/DEBUG(3123): pid: 11197, tid: 13166, name: GLThread 41001  >>> cc.openframeworks.androidNativeActivityExample <<<
10-05 16:00:56.847: I/DEBUG(3123): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x1300120
10-05 16:00:56.857: I/DEBUG(3123):     r0 ba1260fc  r1 00000000  r2 00000001  r3 01300120
10-05 16:00:56.867: I/DEBUG(3123):     r4 ba1de35c  r5 ba03d054  r6 ba1de370  r7 ba262a30
10-05 16:00:56.867: I/DEBUG(3123):     r8 ba126188  r9 00000010  sl 9e2af934  fp 00000000
10-05 16:00:56.867: I/DEBUG(3123):     ip 9d522000  sp 9e2af928  lr a122146c  pc 01300120  cpsr a00f0010
10-05 16:00:56.867: I/DEBUG(3123): backtrace:
10-05 16:00:56.867: I/DEBUG(3123):     #00 pc 01300120  <unknown>
10-05 16:00:56.867: I/DEBUG(3123):     #01 pc 00087468  /mnt/asec/cc.openframeworks.androidNativeActivityExample-2/lib/arm/libOFAndroidApp_neon.so
10-05 16:00:56.867: I/DEBUG(3123):     #02 pc 0008751c  /mnt/asec/cc.openframeworks.androidNativeActivityExample-2/lib/arm/libOFAndroidApp_neon.so (ofNotifyEvent(ofEvent<void, std::mutex>&)+8)
10-05 16:00:56.867: I/DEBUG(3123):     #03 pc 000878ec  /mnt/asec/cc.openframeworks.androidNativeActivityExample-2/lib/arm/libOFAndroidApp_neon.so (Java_cc_openframeworks_OFAndroid_onSurfaceCreated+552)
10-05 16:00:56.867: I/DEBUG(3123):     #04 pc 000bf3f1  /data/dalvik-cache/arm/mnt@asec@cc.openframeworks.androidNativeActivityExample-2@base.apk@classes.dex

But you can see that it crashed inside ofNotifyEvent and haven’t reached a listener.

I didn’t want to open a bug without being sure there is a problem that is reproducible, and currently it’s on my own master derived branch… I’ll try to reproduce it on actual master before opening an issue, but here’s some more info that might allow to track down the problem:

I added some debug logs to see what’s happening, so the relevant part looks like this:

ofLogVerbose("ofAppAndroidWindow") << "onSurfaceCreated, calling reloadGL";
ofLogVerbose("ofAppAndroidWindow") << "functions size = " << ofxAndroidEvents().reloadGL.size();
for(auto & f: ofxAndroidEvents().reloadGL.functions){
        ofLogVerbose("ofAppAndroidWindow") << "target_type = " << f.function.target_type().name();
    		}
ofNotifyEvent(ofxAndroidEvents().reloadGL); //<<<<<< actual call to the event. We crash here.
ofLogVerbose("ofAppAndroidWindow") << "onSurfaceCreated, called reloadGL";

And here’s the log:

10-05 16:00:56.697: V/ofAppAndroidWindow(11197): onSurfaceCreated, calling reloadGL
10-05 16:00:56.697: V/ofAppAndroidWindow(11197): functions size = 8
10-05 16:00:56.697: V/ofAppAndroidWindow(11197): target_type = ZN7ofEventIvSt5mutexE13make_functionI13ofxAndroidAppEEN2of4priv8FunctionIvEEPT_MS8_FvvEiEUlPKvE_
10-05 16:00:56.697: V/ofAppAndroidWindow(11197): target_type = ZN7ofEventIvSt5mutexE13make_functionI14ofTrueTypeFontEEN2of4priv8FunctionIvEEPT_MS8_FvvEiEUlPKvE_
10-05 16:00:56.697: V/ofAppAndroidWindow(11197): target_type = ZN7ofEventIvSt5mutexE13make_functionI8ofImage_IhEEEN2of4priv8FunctionIvEEPT_MS9_FvvEiEUlPKvE_
10-05 16:00:56.697: V/ofAppAndroidWindow(11197): target_type = ZN7ofEventIvSt5mutexE13make_functionI8ofImage_IhEEEN2of4priv8FunctionIvEEPT_MS9_FvvEiEUlPKvE_
10-05 16:00:56.697: V/ofAppAndroidWindow(11197): target_type = ZN7ofEventIvSt5mutexE13make_functionI8ofImage_IhEEEN2of4priv8FunctionIvEEPT_MS9_FvvEiEUlPKvE_
10-05 16:00:56.697: V/ofAppAndroidWindow(11197): target_type = ZN7ofEventIvSt5mutexE13make_functionI8ofImage_IhEEEN2of4priv8FunctionIvEEPT_MS9_FvvEiEUlPKvE_
10-05 16:00:56.697: V/ofAppAndroidWindow(11197): target_type = ZN7ofEventIvSt5mutexE13make_functionI8ofImage_IhEEEN2of4priv8FunctionIvEEPT_MS9_FvvEiEUlPKvE_
10-05 16:00:56.697: V/ofAppAndroidWindow(11197): target_type = ZN7ofEventIvSt5mutexE13make_functionI8ofImage_IhEEEN2of4priv8FunctionIvEEPT_MS9_FvvEiEUlPKvE_
10-05 16:00:56.697: V/ofxAndroidApp(11197): reloadGL START
10-05 16:00:56.697: V/ofxAndroidApp(11197): reloadGL END
10-05 16:00:56.697: V/ofTrueTypeFont(11197): reloadTextures START
10-05 16:00:56.737: V/ofTrueTypeFont(11197): reloadTextures END
10-05 16:00:56.737: A/libc(11197): Fatal signal 11 (SIGSEGV), code 1, fault addr 0x1300120 in tid 13166 (GLThread 41001)

I printed the listeners (I wanted to print the actual object pointers but couldn’t figure out if it’s even possible). There’s the app, one trueTypeFont, and then 6 images. The event reaches the app and then the true type font, but never reaches the images (I have a log there also, of course). Now, the reason I think it’s a memory smear of some sort, is that if I fix that registration in true type font, the event can reach the images that follow it. Also, if I remove 5 of the 6 images, it also works…

In short - it’s not that easy to find the exact reproduction steps because things affect things strangely. But I’m pretty sure it’s enough proof that there’s a problem inside ofEvents, right?

can you try to get the stacktrace using ndk-gdb in debug mode?

if you run it with ndk-gdb, in the project folder like:

ndk-gdb -p . --start

it’ll stop when the application crash then typing bt you can get the stack trace with line numbers…

couldn’t get it to work.

First, I needed to add

--adb=<path to adb>

Then, I needed to copy the contents of libs/armeabi-v7a to libs/ since it wanted it there.

Now, it connects and then fails with this:

## COMMAND: adb_cmd forward tcp:65534 jdwp:2025
java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:196)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at com.sun.tools.jdi.SocketTransportService.handshake(SocketTransportService.java:130)
	at com.sun.tools.jdi.SocketTransportService.attach(SocketTransportService.java:232)
	at com.sun.tools.jdi.GenericAttachingConnector.attach(GenericAttachingConnector.java:116)
	at com.sun.tools.jdi.SocketAttachingConnector.attach(SocketAttachingConnector.java:90)
	at com.sun.tools.example.debug.tty.VMConnection.attachTarget(VMConnection.java:519)
	at com.sun.tools.example.debug.tty.VMConnection.open(VMConnection.java:328)
	at com.sun.tools.example.debug.tty.Env.init(Env.java:63)
	at com.sun.tools.example.debug.tty.TTY.main(TTY.java:1066)

Fatal error:
Unable to attach to target VM.
Pulled cache.so from device/emulator.
$NDK_PATH/ndk-gdb: line 839: adb: command not found
Pulled lib.so from device/emulator.
$NDK_PATH/ndk-gdb: line 839: adb: command not found
Pulled shared_prefs.so from device/emulator.
$NDK_PATH/ndk-gdb: line 839: adb: command not found
cp: ./libs/armeabi/gdb.setup: No such file or directory
GNU gdb (GDB) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-apple-darwin --target=arm-linux-android".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://source.android.com/source/report-bugs.html>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
./obj/local/armeabi/gdb.setup:9: Error in sourced command file:
Remote connection closed
(gdb) 

And I couldn’t find a solution for this. Wasn’t my steps earlier supposed to work too, without debuging “live” but instead translating the stack after the fact?

Meanwhile, I was also sifting through ofEvent’s code and I have a theory. In the next reply.

I have a theory, but this code is kinda foreign to me, so I may be speaking nonsense, but here goes:

  for(auto & f: functions_copy){
        			    bool ret = false;
        			    try{
    			        ret = f->function(sender);
    			    }catch(std::bad_function_call &){}
    				if(ret){
    					throw ofEventAttendedException();
    				}
    			}
    			ofEvent<void,Mutex>::notifying = false;

I see that if a handler is saying that the event is handled, an exception is thrown, but then notifying is not being marked as false.

But in “remove”, if we try to remove a handler while notifying:

if(notifying){
            for(auto & f: functions){
                if(f == function){
                    f.function = nullptr;
                }
            }

It doesn’t delete it, but instead set it to nullptr. And I don’t see a check for that when notifying, so perhaps this is what happening somehow?

P.S.

Accidently deleted the extended stack trace from the debug build by running it over when editing, so here it is again:

10-05 22:24:59.645: I/DEBUG(3123): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
10-05 22:24:59.645: I/DEBUG(3123): Build fingerprint: 'htc/htc_europe/htc_m8:5.0.1/LRX22C/448934.10:user/release-keys'
10-05 22:24:59.645: I/DEBUG(3123): Revision: '0'
10-05 22:24:59.645: I/DEBUG(3123): ABI: 'arm'
10-05 22:24:59.645: I/DEBUG(3123): pid: 15901, tid: 17194, name: GLThread 3221  >>> cc.openframeworks.androidNativeActivityExample <<<
10-05 22:24:59.645: I/DEBUG(3123): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8ff08fe
10-05 22:24:59.825: I/DEBUG(3123):     r0 b951b724  r1 00000000  r2 9e19f5f4  r3 00000000
10-05 22:24:59.825: I/DEBUG(3123):     r4 08ff08ff  r5 b951b724  r6 b951b6f0  r7 12dc1160
10-05 22:24:59.825: I/DEBUG(3123):     r8 12e235e0  r9 b9428908  sl 00000001  fp 9e19f5e4
10-05 22:24:59.825: I/DEBUG(3123):     ip 00000001  sp 9e19f5d0  lr a0ff6c80  pc 08ff08fe  cpsr 600f0030
10-05 22:24:59.825: I/DEBUG(3123): backtrace:
10-05 22:24:59.825: I/DEBUG(3123):     #00 pc 08ff08fe  <unknown>
10-05 22:24:59.825: I/DEBUG(3123):     #01 pc 0007ec7c  /mnt/asec/cc.openframeworks.androidNativeActivityExample-1/lib/arm/libOFAndroidApp.so (std::function<bool (void const*)>::operator()(void const*) const+88)
10-05 22:24:59.825: I/DEBUG(3123):     #02 pc 0007cb08  /mnt/asec/cc.openframeworks.androidNativeActivityExample-1/lib/arm/libOFAndroidApp.so (ofEvent<void, std::mutex>::notify(void const*)+320)
10-05 22:24:59.825: I/DEBUG(3123):     #03 pc 00077970  /mnt/asec/cc.openframeworks.androidNativeActivityExample-1/lib/arm/libOFAndroidApp.so (ofNotifyEvent(ofEvent<void, std::mutex>&)+24)
10-05 22:24:59.825: I/DEBUG(3123):     #04 pc 0007a9c8  /mnt/asec/cc.openframeworks.androidNativeActivityExample-1/lib/arm/libOFAndroidApp.so (Java_cc_openframeworks_OFAndroid_onSurfaceCreated+872)
10-05 22:24:59.825: I/DEBUG(3123):     #05 pc 000bf3f1  /data/dalvik-cache/arm/mnt@asec@cc.openframeworks.androidNativeActivityExample-1@base.apk@classes.dex

UPDATE: after some more sifting, I found where null functions are being erased, but I don’t think it’s fool proof. Anyway, even without line numbers, the stack clearly show that the crashing line is:

ret = f->function(sender);

So can line numbers teach us anything else?

that’s correct, now just press c for continue and it’ll run the application

but it does run the application, even without the c. But the debugger seems to be disconnected as soon as the app is loaded on the device. Inputing c doesn’t do anything in particular.

(gdb) c
The program is not being run.

I also tried ignoring the errors and just continuing with reproducing the error, and after the crash inputing ft, but nothing:

(gdb) ft
No default breakpoint address now.

I also tried a second device, but got same results.

More sifting. I think it’s indeed related to the fact that ofTrueTypeFont adds a listener in load, which is called from the event that is currently being fired.

I think that adding that extra listener botches up the memory somehow and causes the next listener to fail.

Maybe the emplace() in add() changes the addresses for the existing members and now functions_copy has invalid pointers?

UPDATE: yep. it seems that this is indeed what’s happening: http://www.cplusplus.com/reference/vector/vector/emplace/

yeah, both things are bugs, thanks for looking into it, i’m fixing it now

i’ve just pushed some changes to ofEvent.h that should fix this problems but let me know if there’s still any issue. thanks!

Hi,

I am getting a complete unsubscribe from update events in my app when I unregister from the update callback in any one class. This is in our main app which has a number of objects (living in their own threads in some cases) listening to the update callback at any given time.

Note: there is no crash, it just appears that all classes unregister when I unsubscribe from one, and the view stays the way it is. The app appears hung, but tapping on the screen will register input on the Java end.

To test this further, I reversed the following commit:

https://github.com/openframeworks/openFrameworks/commit/48d58e8dadc00670e410eae4672fb7706eada199

This resulted in a crash, as I seem to get an extra callback to update[1] in a class that unregisters itself, however, there is no freeze. If I do not delete the object in question and simply unsubscribe from the update callback, there is no crash (just a leak).

I’ll try and post a reproducible sample, but going back a couple of additional commits and the same codebase works flawlessly for me.

Note: the update does fix the onSurfaceCreated crash on Android … so not all bad news :wink:

[1] I unsubscribe from the callback from within the callback. Example:

void someClass::update(ofEventArgs &args) {
    if (doDisconnect) {
            ofRemoveListener(ofEvents.update(), this, &someClass::update);
            callDisconnect();
            return;
    }
}

After the object unsubscribes and deletes itself, the event system tries to call update on it again and crashes.

Hope to update with more info soon.

i’ve just fixed the problem with the recursive calls but i can’t see anything like what you describe about all listeners being unsubscribed. can you post a minimal example that shows the problem?

I am likely misreading the cause, but it appears as if “all listeners were removed”, given calls to log from other active objects also stop producing output (including the ofApp::update callback).

I just pulled the update and am looking to test it shortly – hope to update you very soon.

Thanks.

@arturo Issue appears resolved for me – the recursive mutex fixes everything it seems. Thanks a lot!

Also tested, no crashes!! :+1: