Android/Mobile Microconference Notes Welcome to Linux Plumbers Conference 2014. Please use this etherpad to take notes. Microconf leaders will be giving a summary of their microconference during the Friday afternoon closing session. Please remember there is no video this year, so your notes are the only record of your microconference. Slides are at: http://www.linuxplumbersconf.org/2014/ocw/events/LPC2014/tracks/279 10/16 * Morten * What's being done in the middleware? * Rom, we change tasks oom adjustment score * Do you use nice? * Riley, yes and cgroups * Rom, our use of cgroups is course * Juri Lelli, scheduling classes, latency requirment of each task * Riley * audio threads are given realtime priority * Juri, the prioirty is your deadline in the end * Energy management vs. Power Management * The amount of power you burn * How long can you run on your battery? * Run at a lower energy * big, you burn more energy * Rom, the curve changes dramatically * Marten, should we expose an energy model * User space informing * Zach, could Activity manager give power hints? * Riley, yes * Tom, does ANdroid know which are the critical sections, you would want to run on the fastest core * What can we know more from userspace to make the heuistic better * CFS schduler, deadline (scheduler), realtime * Put it in the power HAL hints? * Core problem: how do you complete an amount of work in an amount of time with a minimum of energy? * Rom, we added a hook that says * We have a negative feedback, you want to maintain this level of performance * CPU vs GPU * Rom, we added a hint to the govenor * Improve on big.LITTLE * Can we use cgroups * Rom, you would create a foreground service * Riely, you can use a scheduling tasks * Riely, we put background in a cgroup that gets 5% * pin audio playback to little * Juri, we could use cgroups and deadline * touch boost * heuristic * feedback of actual energy measurement? * ... * SurfaceFlinger, you want to take it as long as possible * you want to compress it intime for your UI * Morten, we do track the amount of time * Execution time doesn't tell the whole story * Tom, for audio, do you differentiate a click vs. MP3 playback * Riley, it would be interesting to pursue giving a latency constrint (or something) that this could endure energy efficient scheduling * We don't like cpu hotplug * What is used for CPU hotplug, can we get some heuristics? * Morten, this is frowned upon * If you use CPU hotplug * leakage current goes up when process geometrey goes down * timer framework is not power aware * work queues, kernel threads 10/16 Attending Rom, Jon Corbet, Rodolph, Sumit, Tom Cooksey, Pawel, Laura, Jan Cho, Elliot (Google), Kevin, Serban, Maarten, mark Chambroies, Karim, Daniel Vetter, Jesse Barnes, Laura Abbott,Praneeth Bajjuri, etc. Minutes * Speakers: Rom Lemarchand 1:30 * Karim Yaghmour introduction * Substitute for John Stultz * John has been keeping tabs on where things are at * Binder still a key outstanding issue - not yet merged * Greg KH to show up at some point to discuss * The number of patches that need to be applied on top of mainline for Android is shrinking, so progress is being made * Introduction to AARCH64 1:35-1:50 * Rodolph Perfetta/ARM Software & Sy * Working on the ARMv8 JS engine * ARMv8-A - allows 64 bit * A32+T32 * A64 ISA * A64 does not execute A32 * Big Endian for networking * AArch32 * Crypto extensions, new FP insns, deprecated: SETEND/IT, obsoleted: SWP/SWPB, VFP short vectors, CP15 barriers (new insns) * AArch64 * be careful * add w1, w1, #0 // this is not a nop, clear top 32-bit of x1! * x31 is a special stack pointer register * FP regs - no more overlapping! * 32 registers, each 128 bits * No general purpose PC register * must always be 128-bit aligned * hardware checking of SP alignment is enforced * AAPCS * parameters in x0-x7 * return addr passed in x30 (LR) * return values passed back in x0-x7 * Only a few insns can set flags * adds, ands, subs and aliases * No more load/store multiple * ldp/stp (load/store pairs) instead * Conditional select - csel insn * Set a reg to one of its inputs depending on the condition * can be used for conditional clear with the zero register xzr * Moving code out of staging 1:50-2:15 * Greg K-H * What do we merge * binder * "Binder is broken by design" * New ABI for Binder64 * Binder is not to be used outside of android * Serban, with 64-bit we have a new ABI * Greg, should we move binder to the mainline * Rom, yes * patch doing so has been sent to lkml and driverdev mailing lists: https://lkml.org/lkml/2014/10/16/359 * ashmem * ZachP: "JohnS says ashmem is crucial" * ACTION: Greg to talk to John about moving ashmem to * sync will be talked about separately * graphics guys * Leave timed GPIO in staging * lowmemorykiller, keep in staging, lmkv * ION, not touching that * Greg, lets move binder and ashmem * Greg, new stuff? * Rom, ADF * ACTION: Graphics guys to talk about ADF and DRM/KMS * dma-fence & android Sync 2:15-2:40 * Riley * Slides adapted from Eirk Gilling and Jamie Gennis * BufferQueue * Built on Binder * manage flow of buffers between producers and consumers * two queues * 90% of the time, SurfaceFlinger will be the consumer * SurfaceFlinger * Compositing all the surfaces * HW Composer * Ridiculous graphics subsystems * "Becoming the HAL for all things display" * What's broken with Sync? * No explicit parallelism * Every vendor implements implicit synchronization * historically been the source of lots of lockups and bugs * Galaxy Nexus has implicit sync * "Terrible time for everyone" * So sync was invented * Pass sync objects with gralloc buffers backed by dmabufs * Somewhat odd constructs * Sync timeline * A counter representing work for a given context * Not a analog for time, just a counter * Sync points * a specific time on a timeline * A promise for the future that this buffer will be done * 3 states * active (pending), signalled (complete), error * The primitives that get passed around * sync fences * multiple sync points aggregated together - can be nested (fence inside fence) * QUESTION: Maarten, if you duplicate a sync point, does it create a new point or take a reference on the existing one? * Why * not just for sw-to-sw or sw-to-hw sync, also for hw-to-hw for independent block signalling * between a rendering and display engine, or between media decode engine and GPU * Nvidia semaphores == sync points * implementing a sync_timeline * Try using sw_sync first * Use sw_sync as a starting point * Don't base a timeline on any "real" time * Don't allow userspace to explicitly create or signal a fence * Don't access sync_{timeline,pt,fence} elements explicitly * Do provide useful names, timeline_value str, [...] * OpenGLES integration * What are the advantages of explicit sync? * less behavior variation between devices * better debugging support * better jank metrics * DMA Fence * Maarten added this * Upstream soln for cross-device synchronization * Needed to support NVIDIA Optimus (graphics chip with no display hardware associated for multi-GPU) * Merged in v3.17 * Used for a unified interface for cross-driver synchronization, and for tracking work on a dmabuf * Optimus, Nvidia cross device sync, rely on Intel's hardware to actually display * Compared to sync: * one shot fences ( active->completed) * sync_pt implemented on top of dma-fence in 3.17+, sync_fence can be created from a fence * supports timeline-esque sequence number-based fences * support HW dev-to-dev sync (nv semaphores) * sync waits, async callbacks * State transtions same as sync * Maarten implemented sync on dma fence * Missing from dmafence * No official support for merge fences, but is planned * No user space objects * dmafences are tied to dmabufs * buffer handle and a sync object - with a sync * Differences * explicit sync and user space interface * Nouveau people want to use it * Intel people want to use it * Discussion questions: * Is there a need for explicit sync? Do we need to do both? * Performance of bindless/compute * Improve perf with suballocation * ACTION: Riley to open source his sync unit tests * How to get Mesa to pass the piglit tests with sync * ChromeOS folks moving to change their sync model * NEED: an open source userspace to test * "Still need implicit sync so that the kernel knows what's going on" * "When we [Intel] add atomic modesetting to our DRM driver, we'll add ADF support" * Ion and the DMA coherency model 2:40-3:10 * Laura Abbott/Qualcomm * Work on memory * "ION is doing many things wrong with respect to consistency" * What is meant by coherency? * When a device/CPU writes to memory, what will be observed? * May be implicit or explicit * ION: memory manager wrritten for android * primarily for graphics, but used elsewhere too * generally an allocator, but also a dmabuf exporter * Buffers for SurfaceFlinger * ION terminology * carevout heap, system heap, CMA heap * carveout heap * sync'ed at creation time with dma_sync_sg_for_device (no DMA map first) * memory permanented removed from the buddy system * system heap * allocated via alloc_pages * CMA heap * allocated via dma_alloc_coherent * non-coherent alloc's disallowed * ION cached (non-coherent memory) * Carveout heaps sync'ed at free time, CMA heaps disallowed, system heap syncs right after allocation time * Sync mechanism * explicit ioctl: ION_IOC_SYNC - similar to dma_sync_sg * faulting mechanism: similar to dma_buf docs * ARM doesn't support dma-non-coherent * What's wrong? * Using DMA sync API without map first * no guaranteed enforcement * No need to sync at alloc time * ION non-cached (coherent) buffers * carveouts, CMA do nothing * system heap page pools to avoid repeated alloc/free overhead * keeps a pool of recently-freed buffers * free op will zero & place in pool * shrinker drains pool as needed * what's wrong * still using dma sync APIs without a map for pooling * CMA might be fully correct here * What to do about ION? * ION as completely separate graphics framework? * More practically * Just pull ION alloc methods into the DMA layer - ION should stop pretending to be this * Big issue: don't know what device is being allocated for * delayed freeing throws this off * IO should just stop trying to do anything with coherency * One option: * Must call standard DMA APIs to get coherency * Breaks the page pooling optimization for uncached pages - page pooling is a must * Other thoughts * Hiroshi Doyu's 2014-March patches for IOMMU mappings for attach * Convert sync ioctls to fences? * audiences: "fences are not coherency.." "don't think you should overload fences" * Laura: "ideally I'd like to see the explicit sync operation go ahead completely" * clients get it wrong a lot * What to do about dma_buf ops? * Something is wrong here * There is a missing map call * Audience: "never got around to it" * Discussion prompt: How do we move the DMA layer abuse out of ION? * Danvet: "we can't" * "allocation and syncing is fully opaque" * "no way to do it for multiple devices" * Laura: Is ION a special-case of a DRM driver? * Danvet: "on x86 everything is coherent" * "otherwise we don't have problems with iommus" * Rob Clark posted some patches a while ago to use helpers * Other audience: "allocation and memory management can be separate" * can allocate memory that may never be mapped by the CPU * assume there is no CPU-side mapping at all * Laura: what about the uncached allocation case? * then userspace wants to call mmap() and write to it, but it has not been sync'ed yet * dma api doesn't work any more with multiple devices (?) * What about the page pooling? * Daniel V (danvet): "every DRM driver has their own private page pool" * Jerome Glisse: dma api needs to be split: allocation vs. synchronization * JeromeG: Have 2, 3, or 4 different allocators * JeromeG: on x86 syncs will be no-ops due to coherency * JeromeG: Pools should be shared across the system * Laura: part of this is there already, it's just not available for map * Laurent: at the moment we don't have dma_alloc_coherent() for ARM, but this can be added * DanielV: easiest way would be to get at the allocator without needing a device * Laura: Sumit's new allocator gets a little closer * Laura: would needing to have an associated device require rwriting the Android graphics framwork? * Sumit: Theres no way for the exporter and importer to know how the buffers are mapped * Why does ION and all other drivers require different allocators? * Laura: ION future: wrappers around DMA APIs in the kernel and ioctl()s for userspace apps to use * Sumit: ION ioctls tell what heap to pick from * that's what we were trying to avoid with delayed allocation * Next step * Laura: Look at Sumit's work and see what can be unified with ION * Laura: Can dma_alloc_noncoherent() be used together with sync? * Split alloc and map * Moving Android towards clang 3:10-3:25 * Bernard Rosenkranzer "Bero" * got it booting on Nexus 7 and 10 and runs many apps * Nexus 4 and 5 still problematic * Overall 112 patches submitted: 74 accepted 34 waiting 4 abandoned * git://android.git.linaro.org/aosp-patchsets.git * Cheating twice * Setting LOCAL_CLANG := false for /init and the GLESv1/v2 wrappers * causes those to be built with gcc * Renato: I thought Google said they fixed those? * Elliot: I think Google meant that they were working on those? * Elliot: Also there are many places where assembler is not used * clang /init: reboots the machine instantly before anything comes up * clang GLESv2 crashes the UI on startup * Perf check * clang AOSP binaries are 2.6% than gcc builds * perf: gcc still ahead * clang 20% faster at 'make droidcore' * Sometimes clang is more picky than gcc * register keyword usage * array subscr of type "char" * undef'ed internal fns, undef'ed vars * [...] * empty structs * unused params * "add -1" asm converted to "sub 1" by gas, not by clang * Renato: gas is not wrong, "just weird" * Renato: do you want to see support for this in the clang assembler * Elliot: we would like a flag to state "be more compatible with the old stuff" * clang too picky when the proper behavior could be inferred * "too much to fix" * "about 10 different issues can cause thousands of problems" * Renato: help the people that are trying to help clang * Audience 2: we should use whatever the UAL describes * Renato: the UAL doesn't say anything about this * complains about dead code * sometimes clang finds real bugs * MPEG/TP decoder, qcom camera HAL * bluetooth kernel module * gcc extns * AOSP used to use nested functions,m __builtin_va_arg_pack, VALIS, VLA for non-POD types * 1 clang bug * char array alignment to pagesize * TBD * investigate crashing apps, fix /init and GLES wrappers, test other arches, set up daily builds * update AOSP clang, fix build failures, test different compiler options, build kernel with clang also, optimize * What else? * (no comments) * Elliot, clang assembler is not happy * Bero, we have a global -no-integrated-asm * clang is picky about size, GCC is a little less picky and will pick on * Renato, should we add a flag that says, ignore bad asm so that we can work to depricate it * Elliot, I think this ^^^^ is why we're not building libc with clang * Intel writes all their asm in gas style * We have 10 classes of issues (1000's of instances) * Renato, you could say "add w1,w1,#-1" is wrong * I think we should be pragmatic, where do you draw the line * Bionic - 64 bit ABI and 64-bit ART 3:25-3:45 * Elliot Hughes/Google * re clang: * Google does have someone working on clang on x86_64 * libelfutils has nested functions and is clang-hostile * heavy use of nested fns * "every time libelfutils follks have told folks to 'go away'" * FORTIFY_SOURCE * compile-time warnings inserted by gcc * "lots of trouble getting that to work with clang" * Why bionic? * glibc: it's LGPL * BSD libc: using the Linux kernel: different kernel interface * getcwd(): if Linux can't fit the result in a page, Linux sends an error, BSD does not * BSD getentropy() vs getrandom() * musl: didn't exist, now we are painted into an API corner * Zoidberg: ??? * What is bionic? * Large parts of FreeBSD, NetBSD , OpenBSD libc + script-gen'd system call stubs + homegrwon Linux code if needed * Elliot, mostly OpenBSD * Support for arm,arm64,mips,mips64, x86,x86-64 * Has custom pthread impl that uses Linux futexes * Eliot, we don't have to think hard, which reduces error, by autogening * Elliot, we kind of support MIPS 64 * New MIPS 64 is different than 90's MIPS64 * What's unusual? * No separate libpthread/libresolv, no __isthreaded, no support for old kernels, no legacy syscalls, non-_r fns usually thread-safe, lots of unit tests, all pic/pie * , All the select() related stuff collapsed down into 1 syscall * strace will show openat(), pipe2(), etc. * Third-party stuff is still a problem since we don't support old kernels * versioning * 3rd party devs tend to compile a small part of their code that get statically linked to the old code * Static binaries tend to be very large * _r fns * readdir_r() is the worst * "no one knows how big the buffer should be" * most functions in Bionic just use TLS * PIC/PIE * security guys love ASLR, it's now mandated * What's in the headers? * BSD, homegrown from POSIX specs, uapi headers from the kernel * with Lollipop, all the uapi stuff has a known source * problem is that there are some missing pieces * e.g. from elf.h: only contains what the kernel needs * mostly broken * Spotty C99/C11 support * No particular ver of POSIX * Missing kernel APIS they would like to use * -set_tid_address(2)/clone(2) tid maintenance flags * getrandom() * atomic reading of /prod/pid/maps? * Games may have mapped 1000's of entries * Syscall to query a single mmap region's info * MINHERIT * Elliot, pipe3, Michael, no still pipe2 * Elliot, all user code is a jnilib, everything is a pthread as a result * Elliot, Froyo, GB we got SMP * readdir and readdir_r are always getting this wrong, all non-r are thread safe * All SEC guys love PIE * home grown based on POSIX * pthread * we export all uapi under * in Lollipop we now have * SYS5 IPC? * bionic has been sparse * Elliot, we don't hear a lot of requests for this * musl * If you need to statically link something, consider this ^^^^ * Porting Generic Android Drivers and 64-bit Binder ABI 3:45-4:05 * Serban Constantinescu/ARM * Android on 64-bit kernels with 32-bit userspace * Some challenges... * ioctl interface * ashmem returning -ENOTTY * problem because size_t is different sizes between kernel and userspace toolchain * solution: use compat_size_t * File operation table * added .compat_ioctl handler * ABI * struct flat_binder_object {} * Some members used unsigned long; changed to use __u32 * Took advantage of the changes to add __user annotations to the member type decls * Alignment * Fix for 64-bit struct was to add __attribute__((packed)) * Portable Android Device Driver Tips * - compat_ioctl, explicit size types, native kernel types (pid_t, key_t, etc.), etc. * Upsttream your changes! * Binder * Android IPC * kernel component + userspace component * used for everything * 32 bit process needs to talk to 64-bit process * "both of these worlds need to speak the same protocol" * Userspace compat layer * Most changes hidden in the UAPI structures * Userspace compile switch * All expansion done in the userspace * Same ABI for 32, 64bit * 64-bit Binder API * Try not to depend on the low-level implementation * libbinder & Server Manager talk with Binder kernel driver * Java API and system libraries use libbinder * You ask Service manager for the handle of a service * replace the userspace libbinder for 64-bit support * Proposal: * 1. New ServiceManager * reimplement on top of libbinder * available for review * 2. Proof of concept: new libbinder * Test with BinderAddInts * Android RPC benchmark * BinderAddInts * RPC of remote addition * simple example * Migrating code from ARM to ARM64 4:05-4:30 * Kevin Petit/ARM * Writing portable code * Use size_t and ssize_t instead of int * printf: %zu, %zd * Be careful of bitshifts greater than 31 bits * type promotion * if the second promotion signed -> unsigned happens before an int -> long promotion, the sign bit is lost on 64 bit * sign extension * solution: cast to 64-bit before the cas t to unsigned * (discussion of ARM64 features vs. previous ARM) * no useraccessible PC, Zero register, stack pointer register, SP is stack aligned * only a few insns can use SP register (x31) * SP must be 16-byte aligned * acquire/release memory ordering instructions * conditional execution * no more conditional execution on most instructions * gcd() implementation: ARMv8 uses CSEL, CSNEG * NEON now mandator and part of the main insn set * Sets the core execution flags (NZCV) rather than NEON-specific flags * Neon operand size flags are now specified on the operands, rather than part of the insn * NEON shorter registers are no longer packed into the larger registers * Register aliasing will get in the way * must be done by hand * Renato: use compiler intrinsics * Avoid legacy insns * SWP, SETEND, CP15 barrieres, IT, VFP short vec * Power management discussion: Big.Little 6:30 room 26, will be in 26 * https://etherpad.fr/p/LPC2014_Mobile (pwsan)