Skip to content

memd — References

ESP32-S3-N16R8 + INMP441 | June → November 2026


How To Use This

Don't read linearly. Follow PHASE tags. Each resource says exactly what to read and why. Stop at the annotation — more is not better, faster is.

Cross-reference with checkpoints.md — when you hit a checkpoint, the relevant papers for that phase are listed here. Cross-reference with roadmap.md for the building context around each resource.

Landmark Papers at the bottom are for Twitter threads and blog posts, not for building memd directly. Read them fully. Write publicly about them. They build credibility independently of the project.


Before Phase 0 — Meta Reading

The Double-Edged Nature of Hierarchical Knowledge https://www.justinmath.com/the-double-edged-nature-of-hierarchical-knowledge/ Read before starting. Explains why you learn the minimum viable layer before going deeper.

Skill Trees https://tasshin.com/blog/skill-trees/ Read before starting. Explains why this curriculum is shaped the way it is.


PHASE 0 — C Fundamentals

June, max 2-3 weeks | Then learn by doing

You know Python and Go. You need enough C to write a binary file loader, a struct-based tensor, and manual memory management. That's it. Start building at week 3 regardless.

Beej's Guide to C Programming https://beej.us/guide/bgc/ Read: Chapters 1–8, Chapter 14 (File I/O), Chapter 17 (types and sizes). Skip: threading, networking — not needed for memd. Why not K&R: Beej is faster, more direct, free online.

Stanford CS Library: Essential C — Nick Parlante https://cslibrary.stanford.edu/101/EssentialC.pdf 45 pages. Read pages 20–35 only. Best explanation of heap vs stack that exists.

Beej's Pointers chapter https://beej.us/guide/bgc/html/split/pointers.html Read twice. memd-rt does uint8_t *ptr = weights_buffer + offset constantly. If this isn't intuitive, nothing will be.

Beej's malloc/free chapter https://beej.us/guide/bgc/html/split/memory-management.html memd pre-allocates all buffers at init time. Understand why malloc during inference is bad before you write the first line of memd-rt.

cppreference — fixed-width integer types https://en.cppreference.com/w/c/types/integer Bookmark this page. You'll return to it constantly. uint8_t = quantised weights. int32_t = accumulator. float = scale factors.

Bitwise operations in C https://www.geeksforgeeks.org/bitwise-operators-in-c-cpp/ Read: AND, OR, right shift. You'll use >> shift for fast INT32 → INT8 rescaling in quantisation.

Beej's File I/O chapter https://beej.us/guide/bgc/html/split/file-io.html fread, fwrite, fopen("rb"). Your .memd weight loader is these three functions.

A Simple Makefile Tutorial https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/ Read all 7 parts. You need a Makefile for desktop testing before porting to ESP-IDF's CMake system.


PHASE 0 — ESP32-S3 and Audio Toolchain

June Week 1, parallel with C reading

ESP-IDF Getting Started https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/get-started/ Follow for your OS. Hello World flashing before anything else. Don't skip this step — Checkpoint 0.1.

ESP32-S3 Technical Reference Manual — Memory chapter only https://www.espressif.com/sites/default/files/documentation/esp32-s3_technical_reference_manual_en.pdf Read Chapter 3 only. 30 minutes. IRAM, DRAM, PSRAM, flash are four different things with four different access speeds. memd-rt's performance depends on understanding which memory holds what. Relates to Checkpoint 0.3.

ESP32-S3 heap capabilities https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-reference/system/mem_alloc.html heap_caps_malloc(size, MALLOC_CAP_INTERNAL) forces SRAM allocation. Critical — inference buffers must be in SRAM not PSRAM.

ESP32-S3 I2S driver https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-reference/peripherals/i2s.html Read the standard mode + MEMS mic example. Get INMP441 capturing before writing any ML code — Checkpoint 0.2.

INMP441 datasheet https://invensense.tdk.com/wp-content/uploads/2015/02/INMP441.pdf Read: pin configuration, I2S timing, sensitivity spec. 20 minutes. Know what your microphone actually measures before wiring it.

ESP-DSP library — FFT https://docs.espressif.com/projects/esp-dsp/en/latest/esp-dsp-algorithms.html Read the fft_f32 example. You use esp-dsp for the FFT step in memd's mel spectrogram pipeline. Don't write FFT from scratch — it's a separate month-long project and not memd's contribution.


PHASE 1 — Audio ML Fundamentals

June, parallel with hardware setup

Feature Extraction — Read In This Order

Speech Processing for Machine Learning — Haytham Fayek https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html Read fully first. 30 minutes. The clearest explanation of MFCC and mel spectrograms on the web. Read this before any paper or tutorial. Connects directly to Checkpoint 1.1.

Mel Frequency Cepstral Coefficients explained https://medium.com/prathena/the-dummys-guide-to-mfcc-aceab2450fd Supplementary. The mel filterbank diagram is useful when you're implementing it in C.

Datasets for memd-net Training

ESC-50: Dataset for Environmental Sound Classification — Piczak 2015 https://github.com/karolpiczak/ESC-50 Primary dataset. 2000 clips, 50 classes, 5 seconds each. You'll use 8 classes. Read the README fully before downloading — class taxonomy and official train/test fold split matter for reproducible results. Download this in week 2.

Speech Commands — Warden 2018 https://arxiv.org/abs/1804.03209 Secondary dataset for speech vs non-speech. Read Abstract and Section 2 only.

The Paper That Defines memd's Research Space

MCUNet: Tiny Deep Learning on IoT Devices — Lin et al., MIT 2020 https://arxiv.org/abs/2007.10319 Read fully. This is your primary citation for everything. They fit a neural net in 256KB SRAM — your budget is 512KB. Their core insight: co-design the neural architecture with the inference memory schedule. That's exactly what you're doing with memd-net. Read in week 1.

MCUNetV2 — Lin et al., MIT 2021 https://arxiv.org/abs/2110.15352 Read after V1. Patch-based inference to reduce peak activation memory further. Understand the idea even if you don't implement it.

MobileNetV1 — Howard et al. 2017 https://arxiv.org/abs/1704.04861 Skim in 20 minutes. Table 1 shows why depthwise separable convolution is 8-9x cheaper than regular conv. memd-net's architecture is built on this operation.

PyTorch — Saving and Loading Models https://pytorch.org/tutorials/beginner/saving_loading_models.html You need: iterate over named parameters, export weights as numpy arrays. That's all. Read before Checkpoint 2.1.


PHASE 2 — Quantisation

July | Read before implementing

Read In This Order

Quantization in Deep Learning — Hugging Face blog https://huggingface.co/docs/optimum/concept_guides/quantization Read first. Plain English explanation before the academic paper. 20 minutes.

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference — Jacob et al., Google Brain 2018 https://arxiv.org/abs/1712.05877 Read: Abstract, Section 2 (quantisation scheme), Section 3 (integer-arithmetic-only inference). The canonical INT8 paper. The formula for combining input and weight scale factors is in Section 2. You will read this multiple times. Start reading this in parallel with float32 runtime work.

A White Paper on Neural Network Quantization — Nagel et al., Qualcomm 2021 https://arxiv.org/abs/2106.08295 Read: Sections 1, 2, and 4 (PTQ). More practical than Jacob et al. Qualcomm wrote this for engineers implementing quantisation — which is what you are. Directly relevant since your cold mail targets include Qualcomm-adjacent labs.

PyTorch Quantization documentation https://pytorch.org/docs/stable/quantization.html Read "Introduction to Quantization" only. The per-tensor vs per-channel diagram is the clearest visual of the difference.

Audio-specific quantisation insight (not in any paper — this is yours): Mel spectrograms have non-uniform energy distribution across frequency bins. Low mel bins (bass frequencies) have much higher energy than high mel bins. Per-tensor quantisation computes one scale across all bins — this clips high-frequency content. Per-channel quantisation (one scale per output channel) fixes this. This finding is not documented in standard quantisation tutorials. Finding and documenting it is memd's contribution to the audio quantisation space. See L1.1 and Checkpoint 2.4 in checkpoints.md.


PHASE 2 — Runtime Implementation References

July-August | Read, don't copy

GGML — quantisation source https://github.com/ggerganov/ggml Read: src/ggml-quants.c, search for ggml_quantize_q8_0. Read that function. The structure of your INT8 quantise function will resemble this. Do not copy — read for structural understanding.

Darknet — convolutional layer https://github.com/pjreddie/darknet Read: src/convolutional_layer.c and src/blas.c. You're not using darknet. Reading it for conv2d loop structure and memory access pattern — how does someone lay out a C convolution loop? This is your reference for Checkpoint 2.2.

GGUF format specification https://github.com/ggerganov/ggml/blob/master/docs/gguf.md Read metadata and tensor data sections. Design your .memd binary format by simplifying this, not by inventing from nothing.

ESP-DL — Espressif's ML framework https://github.com/espressif/esp-dl Read esp_nn/src/. You benchmark against ESP-DL at Checkpoint 2.6. Understanding what it does (PIE SIMD intrinsics, memory-aligned buffers) tells you exactly why it's faster and what your blog post should say about the gap.

ESP32-S3 PIE vector instructions https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-guides/xt-isa-overview.html Skim the vector instruction section. You don't implement these — plain C first. But knowing they exist explains the ESP-DL benchmark gap. This is what Blog Post 1 explains.


PHASE 3 — Memory System and Backend

September

LittleFS on ESP-IDF https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-reference/storage/littlefs.html memd's flash filesystem. Read the getting started example before Checkpoint 3.1.

ESP-IDF HTTP Server https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-reference/protocols/esp_http_server.html For WiFi sync to Python backend. Read basic example before Checkpoint 3.3.

ESP-IDF SNTP (NTP time sync) https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-reference/system/system_time.html Read the SNTP section. Required for Checkpoint 3.2. See L3.1 in checkpoints.md.

ESP-IDF Deep Sleep https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-reference/system/sleep_modes.html For power duty cycle measurement. Measure current draw: continuous inference vs VAD-triggered with deep sleep. The difference is dramatic and tweetable.

SQLite Python documentation https://docs.python.org/3/library/sqlite3.html memd backend store. The knowledge graph is two tables: events and edges. Read the basic tutorial.

Ollama — local LLM inference https://ollama.com Optional but recommended. Run llama3.2 locally. "What did I spend most of yesterday doing?" becomes a real query. Install guide on front page. Adds 2-3 days, makes the demo significantly more compelling.


PHASE 4 — Research Papers

Month 4+, search-driven

How To Find Papers That Are Actually Interesting

Start from your build problems, not abstract topics.

Your actual memd build problems → your search queries: - Hit mel filterbank quantisation bias → search mel spectrogram quantisation edge inference - Hit peak activation memory pressure → search peak memory reduction inference microcontroller - Hit session merging question → search episodic memory formation temporal segmentation - Hit VAD false positive problem → search voice activity detection microcontroller low power

Use Semantic Scholar (semanticscholar.org). Better than Google Scholar for CS. Filter: last 3 years, Computer Science.

Read each paper: abstract + conclusion (30 seconds) → figures + tables (2 minutes) → intro (3 minutes) → method section only for parts solving problems you've personally hit.

You need 15-20 papers read deeply. Not 100 shallowly.

Papers in Order

1. Jacob et al. 2018 — already listed above, read in Phase 2

2. MCUNet + V2 — already listed above, read in Phase 1

3. Han et al. — Deep Compression, 2016 https://arxiv.org/abs/1510.00149 Foundational model compression: pruning + quantisation + Huffman coding. Read Abstract, Sections 1-3.

4. Banbury et al. — MLPerf Tiny Benchmark, 2021 https://arxiv.org/abs/2106.07597 How to benchmark inference on microcontrollers correctly. Model memd's benchmarking methodology on this paper. Your benchmark numbers in Blog Post 1 should follow MLPerf Tiny's reporting format.

5. Zhang et al. — Hello Edge: Keyword Spotting on Microcontrollers, 2017 https://arxiv.org/abs/1711.07128 The most directly relevant audio + microcontroller paper. Read fully. Keyword spotting is adjacent to memd's audio event classification. Your problem is harder (8 environmental classes vs limited vocabulary keywords) but the hardware constraints are identical.

6. Berg et al. — LEAF: A Learnable Frontend for Audio Classification, 2021 https://arxiv.org/abs/2101.08596 Learnable mel filterbank — instead of fixed triangular filters, learn the filterbank from data. Interesting contrast to memd's fixed filterbank. The question for your blog: what would a learnable frontend cost on ESP32-S3, and is the accuracy gain worth it? (Answer: probably not, the weights add memory pressure and fixed filters are interpretable.) Good research question framing.

7. Nagel et al. — ADAROUND, 2020 https://arxiv.org/abs/2004.10568 Advanced PTQ rounding strategy. Read after implementing basic PTQ. Shows you where the field moved after Jacob et al. "What memd-rt could do next for quantisation" is a natural section of Blog Post 2.

Conferences to Follow

Subscribe to proceedings on Semantic Scholar. Skim new paper titles when they drop.

  • MLSys (proceedings.mlsys.org) — systems + ML intersection. Your primary conference.
  • SenSys (dl.acm.org/conference/sensys) — embedded systems + sensing. memd's application space.
  • TinyML Summit — MCU-scale ML specifically. Smaller but most directly relevant.
  • Interspeech / ICASSP — audio ML. For the audio-specific angle.
  • ASPLOS — architecture + systems. For the inference runtime angle.

Landmark Papers — Twitter Threads and Blog Posts

Read each fully. Write one Twitter thread per paper. The angle matters more than the summary — find what's non-obvious or counterintuitive and lead with that. These build public credibility independent of memd.

Post one thread per week from Month 4 onwards. Don't rush them earlier — build memd first.


Attention Is All You Need — Vaswani et al., Google Brain 2017 https://arxiv.org/abs/1706.03762 The transformer paper. Thread angle: "The title is a flex aimed at the whole field. Here's what they were replacing (RNNs), why RNNs were fundamentally broken for long sequences, and why the queries/keys/values formulation is a genuinely clever idea. Most people discuss this paper without having read it." memd connection: transformers are terrible on microcontrollers — too many parameters, attention is O(n²). memd-net uses a CNN because of this. "Why I chose a CNN over a transformer for memd" is a strong blog post section.


Sparks of AGI: Early Experiments with GPT-4 — Bubeck et al., Microsoft Research 2023 https://arxiv.org/abs/2303.12528 155 pages. Microsoft researchers probed GPT-4 for months. Thread angle: "Microsoft researchers wrote a 155-page paper claiming GPT-4 shows early AGI sparks. Here are the 5 most surprising findings — and the 2 most important criticisms of their methodology." memd connection: the thing GPT-4 lacks is persistent memory across conversations. memd is a hardware attempt to provide exactly that layer. "The memory gap that Sparks of AGI didn't address — and why it matters for your hardware layer."


MemGPT: Towards LLMs as Operating Systems — Packer et al., UC Berkeley 2023 https://arxiv.org/abs/2310.08560 LLM with hierarchical memory — main context + external storage, self-directed memory management. Thread angle: "What if the LLM decided what to remember and what to forget? Berkeley built exactly that. Here's how MemGPT's memory architecture works — and how memd is solving the same problem at the hardware capture layer." Most directly relevant paper to memd's concept. Read this fully before writing Blog Post 2.


Generative Agents: Interactive Simulacra of Human Behavior — Park et al., Stanford 2023 https://arxiv.org/abs/2304.03442 25 AI agents in a simulated town forming memories, making plans, having conversations. Thread angle: "Stanford gave 25 AI agents memory, reflection, and planning. They formed friendships, spread gossip, organised a party. Here's their memory architecture — observation → reflection → retrieval — and why it's surprisingly close to what memd does at the hardware layer." Their memory pipeline (observation → reflection → retrieval) is the same structure memd approximates. This is the paper to cite when you explain memd's conceptual framing.


Whisper: Robust Speech Recognition — Radford et al., OpenAI 2022 https://arxiv.org/abs/2212.04356 OpenAI's audio model trained on 680,000 hours. Thread angle: "Whisper is trained on 680,000 hours of audio and has 1.5B parameters. memd-net is trained on 2,000 clips and has ~30K parameters. Here's what the 50,000x parameter difference buys you — and what it costs. Why sometimes the small model is the right model." Good contrast post that contextualises memd against frontier work without being defensive.


Deep Residual Learning for Image Recognition (ResNet) — He et al., Microsoft 2015 https://arxiv.org/abs/1512.03385 Skip connections. Solved vanishing gradients. Changed deep learning. Thread angle: "Before ResNet, deeper networks were worse than shallow ones. Adding layers hurt accuracy. Here's the one diagram that explains why — and why skip connections are obvious in hindsight but took years to discover."


Distilling the Knowledge in a Neural Network — Hinton et al., 2015 https://arxiv.org/abs/1503.02531 Knowledge distillation. Large teacher model trains small student model. Thread angle: "The paper that invented the idea of one model teaching another. Directly applicable to memd: I could use a large pretrained audio model as a teacher to improve memd-net's 80KB accuracy. Here's how soft labels work and why they're better than hard labels." This technique is worth actually trying on memd-net after basic training is working.


On the Opportunities and Risks of Foundation Models — Bommasani et al., Stanford 2021 https://arxiv.org/abs/2108.07258 The 200-page paper that coined "foundation models." Thread angle: Don't summarise 200 pages. Pick the Privacy section specifically. "What Stanford's foundation models paper says about always-on AI and privacy — and why memd's fully local architecture is a direct response to those concerns." The privacy angle connects directly to memd. One non-ML paper in your reading list signals you think beyond the technical.


Neural Ordinary Differential Equations — Chen et al., NeurIPS 2018 https://arxiv.org/abs/1806.07366 Neural networks as continuous dynamical systems. Conceptually wild. Thread angle: "What if neural networks didn't have layers? What if the depth was continuous? This paper defines neural nets as ODEs. Nothing to do with memd — I just found it genuinely mind-bending and wanted to explain why." One paper that's just interesting for its own sake. Shows you read broadly, not just practically.


An Image is Worth 16x16 Words (ViT) — Dosovitskiy et al., Google Brain 2021 https://arxiv.org/abs/2010.11929 Vision transformer. No convolutions. Thread angle: "They removed every CNN inductive bias — locality, translation equivariance, all of it — and it still worked, with enough data. Here's what that means for how we understand what neural networks actually learn."


Writing References

Sagnik's spheni Reddit post https://www.reddit.com/r/developersIndia/comments/1qz54io/i_made_a_vector_search_engine_from_scratch_in_c/ Study the structure closely: problem → why from scratch → what I found → honest limitations. memd's blog posts follow this exact structure. Note: he links to his code, explains a specific technical finding, and states clearly what doesn't work. Do all three.

Sagnik's NPU writeup https://datavorous.notion.site/NPUs-what-and-where-they-break-367e1be246b480f6ba57cc3730eca4bd The framing: not "here's what NPUs are" but "here's where they break." memd's equivalent: "here's where per-tensor quantisation broke for audio, and what fixing it revealed about mel spectrogram energy distribution."

Show HN format reference Browse news.ycombinator.com with "Show HN" filter. Study successful posts — short title, specific claim, honest about limitations, responds to comments quickly. memd's HN title: Show HN: memd — ambient memory device on ESP32-S3, custom audio runtime, local knowledge graph, no cloud


What To Skip

  • K&R "The C Programming Language" — dense for self-teaching. Beej covers what you need faster.
  • Any YouTube C tutorial — too slow. You already program.
  • cs50 — content you already know.
  • TFLite Micro / Edge Impulse documentation — you're explicitly not using them. Reading them wastes time and tempts shortcuts.
  • Any "Introduction to Machine Learning" course — your portfolio shows you already know this.
  • Arduino IDE — use ESP-IDF directly. Arduino abstracts away the hardware details that are memd's contribution.

Reading Sequence

Week 1      Skill Trees + Hierarchical Knowledge (meta)
            Beej chapters 1-8
            Stanford Essential C pages 20-35
            Fayek speech processing tutorial
            MCUNet paper (full read)
            ESP-IDF Getting Started + Hello World

Week 2      Beej file I/O, fixed-width types
            INMP441 datasheet
            ESP32-S3 TRM chapter 3 (memory)
            ESC-50 dataset: download + explore
            Speech Commands: download

Week 3-4    Beej pointers chapter (twice)
            MobileNetV1 skim
            PyTorch saving/loading models
            Start memd-net training
            (C learning by doing from here)

Month 2     Jacob et al. quantisation paper
            Nagel et al. Qualcomm white paper
            GGML quants source
            Darknet conv source
            Implement float32 runtime

Month 3     INT8 quantisation implementation
            ESP-DL source (for benchmark context)
            ESP32-S3 port
            → Blog Post 1

Month 4     Han et al. Deep Compression
            MLPerf Tiny benchmark paper
            Hello Edge paper
            LEAF paper
            Memory system + backend

Month 4+    Landmark paper threads (one/week):
            Attention Is All You Need
            MemGPT
            Generative Agents
            Whisper
            Sparks of AGI
            Others as interest develops

Month 4+    Semantic Scholar search-driven reading
            from your actual memd build problems

Last updated: June 2026 Project: memd — ESP32-S3-N16R8 + INMP441 + Custom C Audio Inference + Knowledge Graph Cross-reference: roadmap.md (what to build), checkpoints.md (milestones + limitations), prompt.md (full context)