Skip to content

Google Meet Media API - Technical Analysis for Recording Application

CRITICAL FINDING: Wrong Repository

The /Users/maro/Projects/codeseed/Google-Meet-Bot repository is NOT the official Google Meet Media API. It's a Python Selenium-based screen automation bot that uses sounddevice to capture system audio - completely different approach.

The official Google Meet Media API is: - GitHub: https://github.com/googleworkspace/meet-media-api-samples - Language: TypeScript (web) and C++ implementations - Status: Developer Preview (requires enrollment in Google Workspace Developer Preview Program)


1. TypeScript Reference Client Structure

Repository Organization (web/samples directory):

web/
├── samples/          # TypeScript reference implementation
│   ├── src/          # Source TypeScript files
│   ├── app.yaml      # Google App Engine deployment config
│   ├── webpack       # Build tool for bundling
│   └── package.json  # Node.js dependencies
├── external/         # WebRTC dependencies
└── cpp/             # C++ reference client (alternative)

Key TypeScript Modules/Classes:

MeetMediaApiClient (Primary Interface)

interface MeetMediaApiClient {
  // Connection management
  joinMeeting(config: MeetMediaClientRequiredConfiguration): Promise<void>
  leaveMeeting(): Promise<void>

  // Layout/video assignment
  createMediaLayout(request: MediaLayoutRequest): MediaLayout
  applyLayout(layout: MediaLayout): Promise<void>

  // Observable properties (reactive data streams)
  mediaEntries: Subscribable<MediaEntry[]>
  meetStreamTracks: Subscribable<MeetStreamTrack[]>
  participants: Subscribable<Participant[]>
  presenter: Subscribable<MediaEntry | undefined>
  screenshare: Subscribable<MediaEntry | undefined>
  sessionStatus: Subscribable<MeetSessionStatus>
}

MeetMediaClientRequiredConfiguration

interface MeetMediaClientRequiredConfiguration {
  accessToken: string           // OAuth token
  meetingSpaceId: string        // Meeting ID from Meet REST API
  enableAudioStreams: boolean   // Enable audio capture
  numberOfVideoStreams: number  // 1-3 video streams
  logsCallback: (LogEvent) => void  // Logging handler
}

MediaEntry (Participant Stream Data)

interface MediaEntry {
  participant: BaseParticipant  // Participant metadata
  session: string               // Session ID
  sessionName: string           

  // Audio
  audioMeetStreamTrack: MeetStreamTrack | undefined
  audioMuted: boolean

  // Video
  videoMeetStreamTrack: MeetStreamTrack | undefined
  videoMuted: boolean

  // State
  isPresenter: boolean
  screenShare: boolean
  mediaLayout: MediaLayout
}

MeetStreamTrack (Actual Media Stream)

interface MeetStreamTrack {
  mediaEntry: MediaEntry
  mediaStreamTrack: MediaStreamTrack  // Standard WebRTC MediaStreamTrack
}

2. Audio/Video Stream Capture & Processing

WebRTC Architecture:

The API uses WebRTC with Virtual Streams managed by a Selective Forwarding Unit (SFU):

Audio Streams:

  • Exactly 3 audio virtual streams (fixed requirement)
  • Each stream has a static SSRC (Synchronization Source)
  • CSRC headers identify the true speaker when streams switch
  • Opus codec required (48kHz, with inband FEC)
  • Dynamic speaker switching: SFU sends 3 loudest speakers
// Configure 3 audio transceivers in offer
pc.addTransceiver('audio', {'direction':'recvonly'});
pc.addTransceiver('audio', {'direction':'recvonly'});
pc.addTransceiver('audio', {'direction':'recvonly'});

Video Streams:

  • 1-3 video virtual streams (configurable)
  • VP8, VP9, and AV1 codecs required
  • Dynamic participant switching based on relevance
  • Can request specific participants via Video Assignment API
// Configure video transceivers
pc.addTransceiver('video', {'direction':'recvonly'});
pc.addTransceiver('video', {'direction':'recvonly'});
pc.addTransceiver('video', {'direction':'recvonly'});

Stream Processing Flow:

  1. WebRTC Offer/Answer: Client creates SDP offer → Send to connectActiveConference() → Receive SDP answer
  2. ICE Connection: Establish peer connection to Meet servers
  3. Data Channels: Open ordered data channels (session-control, media-stats, participants, media-entries, video-assignment)
  4. Media Arrival: Subscribe to meetStreamTracks for MediaStreamTrack objects
  5. Extract Frames: Use MediaStreamTrack.getSettings() and attach to <video> or process with MediaRecorder

3. Participant Tracking & Speaker Identification

Participant Data Structure:

interface BaseParticipant {
  name: string                    // Resource name
  participantKey: string          // Unique participant ID

  // User identity (one of):
  signedInUser?: {
    user: string                  // Email/user ID
    displayName: string
  }
  anonymousUser?: {
    displayName: string
  }
  phoneUser?: {
    displayName: string
  }
}

CSRC-Based Speaker Identification:

  • CSRC (Contributing Source): 32-bit identifier in RTP packet headers
  • Unique per participant, constant during their session
  • Maps to participant via MediaEntry.participant
// MediaEntry contains CSRC mapping
interface MediaEntry {
  participant: BaseParticipant  // Links to participant metadata
  audioCsrc?: number            // Audio CSRC value
  videoCsrcs?: number[]         // Video CSRC values
  audioMuted: boolean
  videoMuted: boolean
}

Real-time Tracking:

// Subscribe to participant updates
client.participants.subscribe((participants: Participant[]) => {
  participants.forEach(p => {
    console.log('Participant:', p.participant.displayName);
    p.mediaEntries.forEach(entry => {
      console.log('  CSRC:', entry.audioCsrc, 'Muted:', entry.audioMuted);
    });
  });
});

// Subscribe to media entries (active streams)
client.mediaEntries.subscribe((entries: MediaEntry[]) => {
  entries.forEach(entry => {
    if (entry.audioMeetStreamTrack) {
      // This participant is currently speaking (one of 3 loudest)
      console.log('Active speaker:', entry.participant.displayName);
    }
  });
});

4. WebRTC Data Channel Architecture

Required Data Channels (all must be ordered):

session-control

  • Purpose: Session lifecycle management
  • Messages: Leave requests/responses, session status updates
    sessionControlChannel = pc.createDataChannel('session-control', {ordered: true});
    

media-stats

  • Purpose: Upload WebRTC statistics for diagnostics
  • Configuration: Server sends upload interval and allowlist
    mediaStatsChannel = pc.createDataChannel('media-stats', {ordered: true});
    

participants (Server-initiated)

  • Purpose: Receive participant join/leave events
  • Messages: ParticipantsChannelToClient with participant resources

media-entries (Server-initiated)

  • Purpose: Receive media stream assignments
  • Messages: MediaEntriesChannelToClient with media entry resources

video-assignment (Optional)

  • Purpose: Request specific participants' video
  • Messages: SetVideoAssignmentRequest with canvas assignments

Data Channel Protocol:

  • Format: JSON-encoded protobuf-like structures
  • Bidirectional: Client sends requests, server sends resources
  • Resource Snapshots: Full state + incremental deltas

5. Incremental Recording vs Full Download

Current API Limitations:

The Meet Media API does NOT provide built-in incremental S3 upload. You must implement this yourself using:

Approach 1: MediaRecorder API (Browser)

const mediaRecorder = new MediaRecorder(mediaStream, {
  mimeType: 'video/webm;codecs=vp9,opus',
  videoBitsPerSecond: 2500000
});

let chunkNumber = 0;
mediaRecorder.ondataavailable = async (event) => {
  if (event.data.size > 0) {
    // Upload chunk to MinIO incrementally
    await uploadToMinIO(event.data, `recording-chunk-${chunkNumber++}.webm`);
  }
};

// Request data every 5 seconds
mediaRecorder.start(5000);

Approach 2: Custom Frame Processing

// Process raw video frames
const videoTrack = mediaStreamTrack as MediaStreamVideoTrack;
const processor = new MediaStreamTrackProcessor({track: videoTrack});
const reader = processor.readable.getReader();

while (true) {
  const {value: frame, done} = await reader.read();
  if (done) break;

  // Extract frame data, encode, upload to MinIO
  const imageData = await extractFrameData(frame);
  await uploadFrameToMinIO(imageData);
  frame.close();
}

Run a Node.js/C++ server that: 1. Joins meeting using Meet Media API 2. Receives WebRTC streams 3. Uses FFmpeg to mux streams into containers 4. Streams chunks to MinIO S3 using multipart upload

// Node.js example with FFmpeg
import ffmpeg from 'fluent-ffmpeg';
import { S3Client, CreateMultipartUploadCommand } from '@aws-sdk/client-s3';

const command = ffmpeg()
  .input(audioStream)
  .input(videoStream)
  .outputOptions([
    '-f segment',
    '-segment_time 10',
    '-reset_timestamps 1'
  ])
  .on('end', () => console.log('Recording complete'))
  .pipe();

// Stream to MinIO
command.on('data', async (chunk) => {
  await s3Client.uploadPart({
    Bucket: 'recordings',
    Key: `meeting-${meetingId}.mp4`,
    PartNumber: partNumber++,
    Body: chunk
  });
});

6. Authentication Flow

OAuth 2.0 Requirements:

Scopes:
- https://www.googleapis.com/auth/meetings.conference.media.readonly
- https://www.googleapis.com/auth/meetings.space.readonly

Flow:

  1. Get OAuth Token: Use implicit grant flow (web) or service account (server)
  2. Create/Get Meeting Space: Use Meet REST API to get meeting space ID
  3. Join Conference: Call connectActiveConference() with OAuth token in request
  4. Maintain Token: Refresh before expiry (typically 1 hour)

TypeScript Example:

// 1. OAuth (implicit grant for web)
const oauth2Endpoint = 'https://accounts.google.com/o/oauth2/v2/auth';
const params = {
  client_id: YOUR_CLIENT_ID,
  redirect_uri: YOUR_REDIRECT_URI,
  response_type: 'token',
  scope: 'https://www.googleapis.com/auth/meetings.conference.media.readonly ' +
         'https://www.googleapis.com/auth/meetings.space.readonly'
};

// 2. Get meeting space ID
const meetingCode = 'abc-defg-hij';
const response = await fetch(
  `https://meet.googleapis.com/v2/spaces?filter=meetingCode="${meetingCode}"`,
  {headers: {Authorization: `Bearer ${accessToken}`}}
);
const {name} = await response.json(); // spaces/XYZ

// 3. Connect to conference
const client = new MeetMediaApiClient();
await client.joinMeeting({
  accessToken: accessToken,
  meetingSpaceId: name,
  enableAudioStreams: true,
  numberOfVideoStreams: 3,
  logsCallback: (log) => console.log(log)
});

7. Important Gotchas & Limitations

🚨 Critical Limitations:

  1. Developer Preview Only
  2. Requires enrollment in Google Workspace Developer Preview Program
  3. All participants must also be enrolled
  4. Not production-ready

  5. Consumer Meeting Restrictions

  6. For Gmail (@gmail.com) meetings, the meeting organizer must be present to consent
  7. Bot connection rejected if organizer leaves

  8. Security Restrictions

  9. Cannot join encrypted meetings
  10. Cannot join meetings with watermarks
  11. Cannot join if underage accounts present

  12. Virtual Stream Caps

  13. Audio: Exactly 3 streams (no more, no less)
  14. Video: 1-3 streams maximum
  15. No access to all participants simultaneously in large meetings

  16. CORS Requirements

  17. Web client must be deployed (localhost won't work)
  18. Must use HTTPS
  19. Recommended: Deploy to Google App Engine

  20. Codec Requirements

  21. Must support: Opus (audio), VP8/VP9/AV1 (video)
  22. H.264 supported but not required

  23. No Built-in Recording

  24. API provides raw streams only
  25. Must implement own recording/muxing/upload logic
  26. No native support for multi-speaker unmixing

  27. Participant Switching

  28. Active speakers change dynamically
  29. CSRC values must be tracked to identify who's speaking
  30. No guarantee all participants appear in streams

  31. Data Channel Ordering

  32. All data channels must be ordered (unordered not supported)

  33. Meeting Space Lifetime

    • Meeting codes expire after 365 days of inactivity
    • Must use meeting space resource name, not just code

Based on your requirements (TypeScript, MinIO S3, speaker tracking, Docker):

Architecture:

┌─────────────────────────────────────────────────────────┐
│  Frontend (TypeScript/React)                            │
│  - OAuth authentication                                 │
│  - Meeting code input                                   │
│  - Recording controls                                   │
│  - Real-time participant display                        │
└─────────────────┬───────────────────────────────────────┘
                  │ REST API
┌─────────────────▼───────────────────────────────────────┐
│  Backend (Node.js/TypeScript + Express)                 │
│  - Manage OAuth tokens                                  │
│  - Create recording sessions                            │
│  - Route to recording workers                           │
└─────────────────┬───────────────────────────────────────┘
                  │ Job Queue
┌─────────────────▼───────────────────────────────────────┐
│  Recording Worker (Node.js + WebRTC)                    │
│  - Join meeting using Meet Media API                    │
│  - Capture 3 audio + 3 video streams                    │
│  - Track participants via CSRC                          │
│  - Mux streams using FFmpeg                             │
│  - Upload chunks to MinIO (multipart)                   │
└─────────────────┬───────────────────────────────────────┘
                  │ S3 Protocol
┌─────────────────▼───────────────────────────────────────┐
│  MinIO S3 Storage                                       │
│  - Bucket: recordings                                   │
│  - Structure: /{meetingId}/{timestamp}-chunk-{n}.mp4    │
│  - Metadata: speaker CSRC mappings                      │
└─────────────────────────────────────────────────────────┘

Key TypeScript Classes:

// recording-worker.ts
class RecordingWorker {
  private client: MeetMediaApiClient;
  private recorder: StreamRecorder;
  private s3Uploader: S3ChunkUploader;
  private participantTracker: ParticipantTracker;

  async startRecording(meetingSpaceId: string, accessToken: string) {
    await this.client.joinMeeting({accessToken, meetingSpaceId, ...});

    // Subscribe to streams
    this.client.meetStreamTracks.subscribe(tracks => {
      tracks.forEach(track => this.recorder.addTrack(track.mediaStreamTrack));
    });

    // Track speakers
    this.client.mediaEntries.subscribe(entries => {
      this.participantTracker.updateSpeakers(entries);
    });

    // Start recording and upload
    this.recorder.start((chunk, metadata) => {
      this.s3Uploader.uploadChunk(chunk, metadata);
    });
  }
}

// participant-tracker.ts
class ParticipantTracker {
  private csrcToParticipant = new Map<number, BaseParticipant>();

  updateSpeakers(entries: MediaEntry[]) {
    entries.forEach(entry => {
      if (entry.audioCsrc) {
        this.csrcToParticipant.set(entry.audioCsrc, entry.participant);
      }
    });
  }

  getSpeaker(csrc: number): BaseParticipant | undefined {
    return this.csrcToParticipant.get(csrc);
  }
}

// s3-chunk-uploader.ts
import { S3Client, CreateMultipartUploadCommand, UploadPartCommand } from '@aws-sdk/client-s3';

class S3ChunkUploader {
  async uploadChunk(chunk: Blob, metadata: RecordingMetadata) {
    const partNumber = this.getNextPartNumber();
    await this.s3Client.send(new UploadPartCommand({
      Bucket: 'recordings',
      Key: `${metadata.meetingId}/${metadata.timestamp}.mp4`,
      PartNumber: partNumber,
      Body: Buffer.from(await chunk.arrayBuffer()),
      Metadata: {
        speakers: JSON.stringify(metadata.activeSpeakers)
      }
    }));
  }
}

Summary & Recommendations

✅ What You Can Do:

  • Real-time access to 3 audio + 1-3 video streams
  • Track active speakers via CSRC headers
  • Implement incremental S3 upload using multipart upload
  • Build TypeScript full-stack with Node.js backend
  • Deploy in Docker containers

❌ What You Cannot Do (with current API):

  • Access all participants simultaneously (limited to 3 audio, 3 video)
  • Use in production without Developer Preview enrollment
  • Join consumer meetings without organizer present
  • Record without implementing custom muxing/storage
  1. Enroll in Google Workspace Developer Preview Program
  2. Clone official repo: git clone https://github.com/googleworkspace/meet-media-api-samples
  3. Study web/samples TypeScript reference client
  4. Build POC with single stream → MinIO upload
  5. Implement speaker tracking with CSRC mapping
  6. Add FFmpeg-based muxing for multi-stream recording
  7. Integrate with STT service (feed audio chunks directly)

📚 Essential Resources:

  • API Concepts: https://developers.google.com/workspace/meet/media-api/guides/concepts
  • TypeScript Quickstart: https://developers.google.com/workspace/meet/media-api/guides/ts
  • GitHub Samples: https://github.com/googleworkspace/meet-media-api-samples
  • WebRTC Primer: https://webrtcforthecurious.com/
  • Meet REST API: https://developers.google.com/workspace/meet/api/guides/overview
  • Virtual Streams: https://developers.google.com/workspace/meet/media-api/guides/virtual-streams
  • Data Channels Reference: https://developers.google.com/workspace/meet/media-api/reference/dc/media_api

Date

Created: December 16, 2025

Project Context

This analysis is for building a Google Meet recording application with: - TypeScript full-stack architecture - Stream recordings to MinIO S3 incrementally - Track speaker identification - Docker deployment - Future integration with Speech-to-Text (STT)